FIDE: Advanced Cheat Detection Algorithms

Updated: Oct 16, 2025, 8:39 AM | 1

In September 2022, World Chess Champion Magnus Carlsen abruptly withdrew from a tournament after losing to teenage Grandmaster Hans Moke Niemann, sparking widespread accusations that Niemann had cheated. In response, Niemann filed a lawsuit alleging defamation, libel, and an unlawful group boycott against Carlsen and other members of the chess community.

Although platforms like Chess.com almost certainly use their own anti‑cheating algorithms, the specifics of those systems haven’t been made public. This article instead examines the detection method employed by FIDE , known as the “Regan system” , which was developed by Kenneth Regan, a Computer Science professor at the University at Buffalo.

Cheat detection methods

The Regan system derives a player’s estimated skill level , called an Intrinsic Performance Rating (IPR) , by evaluating the quality of the moves they make over one game, a tournament, or a span of games. This IPR functions like an Elo rating, the numeric measure of playing strength used by FIDE and online chess sites based on a player’s past results. The system then compares a player’s IPR to their official Elo (the rating they’ve earned through sanctioned play).

If a player’s IPR substantially exceeds their official Elo, it raises suspicion of cheating. That’s because modern chess engines have outperformed even the world’s best human players since the 1990s , meaning moves influenced by engine assistance will closely match the computer’s top recommendations, far beyond what a human could consistently produce. While using an engine is relatively easy online, doing so in over‑the‑board play requires much greater stealth.

The Regan system detects these anomalies through a structured, four‑step process (see Figure 1).

Step 1: Measuring the deviation from the engine’s top choice

First, the system reviews every position a player faced in the games under analysis. For each position, a chess engine evaluates all legal moves and assigns them a score based on the net advantage in “effective pawns.” The move with the highest engine score is considered the “best move.” By comparing the engine’s evaluation of that optimal move with its evaluation of the move actually played, the algorithm calculates the “drop‑off”, the smaller the gap, the closer the player’s choice was to the engine’s top recommendation, indicating higher move quality.

Step 2: calculating the partial credit, sensitivity and consistency

The algorithm next incorporates two player‑specific parameters: sensitivity, which measures how sharply a player can tell apart moves of slightly different quality, and consistency, which reflects a player’s ability to avoid clearly poor moves. Using these two metrics together with the drop‑off value (d), it computes a partial credit score (y) for each move.

All partial credit scores for a given position sum to one “full credit.” When a single move stands out as the best choice, it captures nearly all the credit because alternative moves contribute very little. Conversely, if several moves are roughly equal in strength, the credit is shared more evenly across them.

Figure 2 Distribution of partial credit

Note: Each dot represents a move’s score, stronger moves appear toward the top left, weaker ones toward the bottom right.

The exact shape of the partial‑credit curve depends on the values of sensitivity and consistency. As shown in Figure 3, a higher consistency produces a curve that hugs the y‑axis, indicating that players with greater consistency rarely choose moves with large drop‑offs, regardless of how many alternatives exist.

Figure 3 Relationship between partial credit (y) and drop‑off (d) for different values of consistency (c)

Lower sensitivity shifts the curve downward for any given drop‑off value, as illustrated in Figure 4. This reflects that players with lower sensitivity better distinguish between moves of similar quality, assigning a larger share of credit to higher‑quality moves.

Figure 4 Relationship between partial credit (y) and drop‑off (d) for different values of sensitivity (s)

Step 3: converting sensitivities and consistencies into an IPR

As noted above, players with higher consistency and lower sensitivity produce curves that remain closer to the y‑axis — a pattern that reflects their stronger ability to select top‑quality moves. In Step 3, the Regan system converts each player’s curve into an Intrinsic Performance Rating (IPR), effectively estimating their Elo based solely on move quality.

Table 1 provides examples of how different combinations of sensitivity (s) and consistency (c) map to specific IPR values. For instance, a player with a consistency of approximately 0.515 and a sensitivity of roughly 0.082 would receive an IPR of 2,700. Because the IPR functions like an Elo rating, and a 2,700 Elo corresponds to “Super GM” level, this indicates that the player performed at the strength expected of a top grandmaster.

Table 1 Examples of conversions of sensitivities and consistencies into IPR

Source: Goldowsky, H. (2014), ‘How To Catch A Chess Cheater: Ken Regan Finds Moves Out Of Mind’, Chess Life, June.

Step 4: testing whether a player’s performance significantly deviates from their Elo

Every chess competitor has an official Elo rating that reflects their historical results: it rises when they beat opponents (more so against stronger foes) and falls when they lose (especially to weaker opponents). Draws typically cause minimal Elo change unless there is a large rating gap between the players.

The Regan system compares a player’s calculated IPR with their official Elo but incorporates statistical uncertainty by converting the difference into a z‑score, a standard metric for spotting outliers. A higher z‑score indicates a larger gap between expected and observed performance, raising the likelihood of cheating.

FIDE’s threshold for suspicion is a z‑score of 4.5, which corresponds to roughly a 1 in 300,000 chance that such a level of play could occur naturally. Only exceptionally anomalous performances exceed this bar. Notably, when Professor Regan applied his method to Hans Niemann’s game against Magnus Carlsen, Niemann’s z‑score did not surpass 4.5, so the system did not flag him as a cheater.

Potential issues with the Regan system

The Regan system has notable limitations.

Choosing a z‑score threshold involves a trade‑off between false positives (innocent players flagged as cheaters) and false negatives (actual cheaters going undetected). A higher threshold reduces false positives but increases false negatives, and vice versa. There is no perfect cutoff: setting it too high risks missing real cheaters, while setting it too low risks wrongly accusing strong performers. Regan’s chosen threshold, a 1 in 300,000 chance of achieving that performance by luck, represents a very high bar for suspicion.

The optimal threshold also depends on cheaters’ tactics. Unsophisticated cheaters tend to outperform their Elo by large margins, easily triggering high z‑scores. In contrast, “smart” cheaters may only use engine assistance sparingly or select second‑best moves, producing subtler performance boosts that can evade detection, especially in games between top players where only a few critical decisions matter.

If most cheaters are unsophisticated, a threshold of 4.5 makes sense: most players above it would be cheaters, and most below it innocent.
Figure 5 Distribution of players’ games and associated z‑scores (I)

However, if many cheaters employ subtle methods, many high z‑scores could actually belong to honest players, reducing the threshold’s reliability.
Figure 6 Distribution of players’ games and associated z‑scores (II)

Additionally, exceptional, but legitimate, performances can produce high z‑scores. Rapidly improving prodigies, players deploying novel opening lines, or simply having an unusually strong day can all exceed the threshold. Post‑COVID training gains have also pushed many players’ actual strength above their recorded Elo.

Empirical evidence confirms false positives: Barnes and Hernandez‑Castro (2015) applied the Regan system to 120,000 pre‑2005 games, an era before powerful engines were publicly available, and still flagged at least 92 players as suspicious, despite cheating being virtually impossible at that time.

Future of cheating detection?

The current algorithmic approach to spotting cheating in chess has significant shortcomings. By only flagging performances as rare as one in 300,000 games, the Regan system risks missing less blatant or “smart” cheaters, particularly elite players who may consult an engine only sporadically.

Clearly, enhancements are needed. Existing machine‑learning models have been trained on limited data and make simplifying assumptions (for instance, that every move by a known cheater is engine‑assisted). Expanding both the quantity and quality of labeled game data would likely improve detection accuracy, but collecting such datasets poses practical challenges.

Future research could introduce new features into detection models. One promising idea is to measure how much deeper engine analysis changes a move’s evaluation, moves identifiable only at high search depths could signal engine use. Alternatively, platforms might crowdsource human difficulty by presenting suspicious positions as puzzles and tracking solve rates: moves that stump even strong players could hint at engine assistance.

Beyond move analysis, richer data sources, such as biometric indicators (heart rate, sweat, facial expressions), could augment detection in over‑the‑board play.

The broader lesson from chess is that performance‑enhancing technology often outpaces detection tools. Just as chess engines can accelerate legitimate learning, they can also facilitate cheating, and distinguishing between the two remains difficult. As AI continues to advance, other fields (from academic testing to competitive gaming) face similar “AI doping” challenges.

For now, high‑tech detection offers no silver bullet. Chess organizers have returned to in‑person events, physical searches, and strict proctoring — costly measures that underscore the reality: not every high‑tech problem has a purely technological solution.