
How likely are you to win a game of chess?
Do you play the game or do you play the rating? I always try to play in focus mode on chess.com so that I'm not intimidated or otherwise affected by my opponent's rating. In "How to Reassess Your Chess", IM Jeremy Silman discusses "rating fear":
There's absolutely no doubt that confidence plays a huge role in chess. Those who are paired with a higher rated player and think, "I'm doomed," will find that the evisceration that follows will be quick and quite painful.
Rather than deciding that the relative strength of two players dictates what will happen on the board, it's much better to assess each position on its merits and play accordingly, regardless of your opponent's rating (easier said than done, however). At the same time, it's obvious to everyone that a higher rated player has a greater chance of winning the game. Let's have a look at some data.
I downloaded a PGN file of two million games of amateur internet chess played between players with Elo rating* between 720 and 2800. I then extracted the result (win/loss/draw for white) and the rating differences (white minus black).
Of course, this completely ignores all the interesting parts of chess like opening repertoire, playing style, the psychological battle and so on, and focuses solely on the outcome. As such, we're looking at the raw probability of the outcome of a game of two random internet chess players.
Let's look first at the relative outcomes. The two million games ended as follows:
Outcome | No. of games |
White wins (1-0) | 999004 |
Black wins (0-1) | 930111 |
Draw (1/2-1/2) | 70885 |
The probability of a draw is around 3.5%. If the game doesn't end in a draw, white has a nearly 2% higher chance of winning. Grouping the differences in rating and calculating the number of wins, draws and losses shows us the empirical probabilities in the PGN file:
Here black columns show the relative proportion of wins for black, grey columns represent draws and white columns are white wins. The number of games in each ratings band are shown at the top. As expected, a relatively higher rating for white leads to more wins for white. The proportion of draws is highest for similar ratings, and decreases as the ratings become more unbalanced.
The natural statistical model for this is a logistic regression, essentially fitting a smooth, symmetric curve to the probability of winning. In fact, the logistic curve is the basis for the Elo rating system, and gives the expected score of a game of chess. (Including draws as a result of 0.5 transforms the data from the probability of white winning to the expected score since draws are counted as 0.5 in tournament scoring.) Fitting the logistic model to the dataset gives the following coefficients:
with ΔW the rating difference (i.e. white rating minus black rating).
The constant term (0.0637) indicates the 1-2% higher probability for white winning, and this is statistically significant, meaning that there is definitely an advantage with the white pieces. (I think this is generally acknowledged for master play, but I'm surprised to see this difference in internet playing. White's first move can definitely set the direction of the game, but I would have thought black's many options for defense should surely nullify this advantage, but apparently not.) The other coefficient (0.0044) determines the slope of the fitted logistic curve. This is more or less fixed by the rating system itself, yet the fitted value is slightly lower than the base-10 value of 1/400 that defines the Elo rating system, indicating the curve fitting the data is a little flatter than it should be theoretically.
Using the equation we can work out what rating difference corresponds to different expected outcomes:
Expected score | ΔW |
15/16 | 630 |
7/8 | 457 |
3/4 | 264 |
1/2 | 14 |
1/4 | -235 |
1/8 | -428 |
1/16 | -601 |
Roughly, a difference of around 200 points doubles your chance of winning. Chess.com's challenge rating window is usually plus or minus 200 points, so you can expect at the minimum a 1 in 4 chance of beating anyone each time you hit the "play" button.
Finally, and perhaps most interestingly, when the rating difference is between 500 and 1000 points (for either black or white), it looks like the lower rated player has a higher than expected chance of winning! Notice that the proportion of wins for the weaker player is consistently more than the logistic curve (and the fitted curve was flatter than the theoretical curve as well).
This is both exciting and worrying depending on which side you're playing on. One the one hand, don't give up when faced with a much higher player; on the other hand, don't assume you're definitely going to win when paired with an apparent patzer. I'm sure we have all played a game against a much lower rated player and thrown out moves without bothering to think too hard. But half way through the game, a sweat breaks out when we realise that through careless blunders, the unloseable game has become unwinnable.
* Chess.com uses the Glicko rating system, which is a slightly more complicated version of the Elo rating system used in the PGN file that I looked at here.
** Equation generated by latex.codecogs.com