ChessMates: A Statistical and Psychological Look at Carlsen vs Niemann

ChessMates: A Statistical and Psychological Look at Carlsen vs Niemann

Avatar of pallavighxsh
| 0

Sometimes, you can feel unrealistically close to players like Magnus Carlsen and Hans Niemann. Online streams, tournament coverage, interviews, and, most importantly, random blitz sessions can make these players feel like part of your daily routine. This is the “Chess Revolution,” as Peter Doggers calls it in his book—and it genuinely reflects what chess culture looks like today.

That’s probably why the Carlsen vs. Niemann situation stood out to me. Don’t get me wrong—I’m the last person to claim any expertise in chess psychology. I get panic attacks the moment someone breaches my side of the board.

And I certainly haven’t picked a side in the Carlsen vs. Niemann case (if I knew the truth, I’d be famous by now!).

All I’m pointing out is how quickly the conversation shifted from analysis to assumption.

After going through the games, the commentary, the Chessmates documentary, and the Chess.com report, I think we’re missing something important.

The Netflix documentary captures the mood really well—the tension, the awkwardness, the way a single result snowballed into something much bigger.

But it also captures something else: how quickly public opinion can be swayed. The documentary shows how Niemann went from “prodigy” to “anal bead guy” almost overnight. That portrayal feels disturbingly accurate. And, to say the least, it’s unfair!

So when Carlsen and Chess.com—both of whom I admire for protecting a community I deeply care about—raised suspicions, they were well within their rights to do so. Someone has to safeguard the game from cheating.

But what followed is the real issue. The suspicion didn’t stay a suspicion for long. It quickly hardened into something people treated as fact.

When algorithms create anomalies

From a chess perspective, the first thing to understand here is a statistical concept called ‘variance’.

At every level—beginner or intermediate—there are games where everything clicks. Moves come naturally, preparation holds up, and decisions align closely with the top engine choices you spent hours at an end studying. These games can seem to be computer moves.

Consider how we evaluate such games today. 

Platforms rely heavily on engines like Stockfish, along with accuracy scores and statistical models. But these systems aren’t perfect—they operate within their own margins of error. Their assessments can vary depending on depth, context, and the nature of the position.

Just try using any chess engine and you’ll notice that the results can vary significantly depending on the data and positions you feed into it.

More importantly, these models are designed for online environments. They perform best over large datasets, where patterns stabilize and variance in their own accuracy gets averaged out.

Cheat detection models are powerful—but they operate on probabilities. And at scale, probabilities behave differently.

  • Thousands of players
  • Tens of thousands of games
  • Constant comparisons across datasets

If the probability of a single “engine-like” game is even very small, then across thousands of games, such performances are mathematically inevitable. This is a classic example of the well-known 'look-elsewhere effect t' phenomenon—the more data you search, the more likely you are to find something that appears significant purely by chance. Here's an example:

Variable Meaning
( p ) Probability a single game looks unusual
( N ) Number of games analysed
Result Probability of ≥1 outlier ≈ ( 1 - (1-p)^N )

Suppose the probability that a single game looks “engine-like” is p=0.001 (1 in 1000), and we analyse N=50,000 games.
Then the probability of observing at least one such outlier is:
P = 1 − (1 − 0.001)^50,000

This evaluates to approximately:
P ≈ 1 − (0.999)^50,000 ≈ 1 − e^(−50) ≈ 1

In other words, it's virtually certain.

The more you search, the more likely you are to find something that looks significant purely by chance.

So asking “How likely is this game?” is the wrong question.

The right one is: “How likely is it that a game like this appears somewhere?”
And the answer is: far more likely than it feels.

And in the Carlsen vs Niemann case, this answer becomes even more relevant. Over-the-board chess just isn’t comparable to online games played on chess.com. 

You’re inevitably looking at a very small sample. In that setting, any algorithm’s own variance becomes much more pronounced. In any statistical analysis:

  • Outliers are expected at scale
  • Algorithms estimate—they don’t conclude

Why “unusual” doesn’t mean “unnatural”

Factor

What it does

Large sample sizes

Guarantees statistical outliers

Algorithm variance

Changes evaluation depending on input/context

Online vs OTB gap

Models trained online don’t transfer cleanly

Rating lag

Rapid improvers look artificially strong

Volume of play

More games = more extreme performances

Let’s load this data into chess psychology!

At the elite level, the pressure is immense—public attention, reputation, and constant scrutiny. So when something unusual happens, the situation can be equally psychologically taxing. In these situations, our intuitive systems are hard at work, and one of these is the Availability heuristic.

Daniel Kahneman discusses the concept of the Availability heuristic in 'Thinking, Fast and Slow', and I think it’s highly relevant here. Availability means that:

We rely on what stands out:

  • a surprising result
  • a very high-accuracy game
  • a reaction from a dominant world champion

These are the things that stick in our minds. They’re easy to recall, easy to discuss, and easy to build a narrative around.

What doesn’t stick as easily are things like:

  • base rates
  • variance
  • how often outliers occur in large datasets

So we end up overweighting what is visible and underweighting what is statistical.

The Chessmates documentary leans into the drama of this. It shows the personalities, the tension, the way the story spread. And to be fair, that’s part of what makes it compelling.

But it also highlights how quickly a situation can move beyond the board.

Once the narrative takes over, everything starts reinforcing it:

  • strong moves look more suspicious
  • past incidents get reinterpreted
  • every new game is viewed through the same lens

At that point, it becomes very hard to separate what we’re seeing from what we expect to see.

This is when it is vital to understand the gap between what feels convincing and what can actually be proven.

Statistical systems can flag anomalies. They can tell us when something looks unusual. But even then, identifying that something is unusual is not the same as proving why it happened.

There’s a difference between:
“this stands out”
and
“this was caused by cheating”

If we keep that in mind, the conversation becomes a lot more grounded.

And in a game that values precision as much as chess does, that’s probably the standard we should be aiming for.