
Who Plays What: a Statsitical Analysis of Chess.com Data (part 2)
Hi all,
I am back today with the second part of my little statistical analysis from chess.com's data, looking at popular openings per elo.
Last time, I focused on the first moves by white, so today, I will look at the first response by white.
Data
I used the same dataset as in part 1, except I let my code run a bit longer to download a few more games. The analysis below is based on 3.4 million games from 1.6 million unique players.
1. e4
What do people play after 1. e4? Let's start by looking at the frequency of each answer across elo. The figure below shows the frequency in percentage (so 0.1 means 10%) of each opening on the y-axis and the elo of the players on the x-axis.
I already showed the first version of this plot on the first post and noted the fascinating pattern with Sicilian slowly but surely becoming the favorite opening as elo increases. Another interesting pattern is how the Caro-Kann (my pet defense) is less popular than the french. Having played both, I am baffled by this, as the Caro seems easier to learn and play and does not have the most boring exchange variation in the world. Finally, note that taking the knight out with Alekhine's defense is just as rare as the Pirc.
Like last time, let's look now at the performance of these defenses. Each plot below shows the probability of finding one result (1-0, 0-1, 1/2-1/2) given an elo and an opening.
On the draw plots (last panel), we see the usual u-shape. Low-rated players don't know how to mate and often draw, while better players don't fall to tactics as often and reach drawn positions more often.
If we focus on the middle plot, which shows how often black manage to win, we see a general pattern. For players with elo above 1400, the least frequent openings tend to outperform, and the most played e4e5 has the worst results. This was to be expected; if you get comfortable with a line your opponent sees less often, you will have a small competitive advantage.
The one that amuses me is the Alekhines' which is my nemesis. I barely know what to do against it and often get crushed. It's mainly my fault. The opening is played so rarely that I couldn't be asked to learn some concrete lines against it. But according to this plot, I am not alone.
We can see here that below 1200 elo, where most people don't know their openings, or at least not well, the somewhat unprincipled Alekhines' defense leads to bad results. Most likely, 700 elo people who play 1e4 Nf6 don't know they are playing Alekhine's defense. They just like to get the horses out because they are comparatively good at tatcitcs and want a complex game. But at a higher level, especially above 1400, where presumably the players have watched a few videos on their pet openings, this unexpected line leads to a very high winning rate for black.
The final plot is, as last time, the upset probability. On the x-axis, you have the difference of elo between white and black. I only look at games where black has a lower rating than white, and on the y-axis, I show the probability that black wins despite his rating disadvantage.
As you can see, there is no large difference between the openings. Perhaps unsurprisingly, the best one to outperform higher-rated opponents is the very sharp Sicilians. The second best seems to be the Caro, while the worst is the Alekhine. As someone who plays the Caro-Kann and hates the Alekhine's, this made me smile.
1. d4
Let's look now at the same analysis with popular answers to 1. d4. First, the frequencies:
As you can see here, the probabilities don't sum to 1, so I must have missed some important eco code. Nonetheless, the d4d5 is the most popular (unsurprisingly) while the king's Indian plays the role of the Sicilian---that is, the sharp and fun line people learn at higher elo, but that almost no beginners ever play.
I put below the plots showing the performances of these defenses:
As with e4, the popular openings tend to underperform slightly.
The upset probabilities also confirm this:
I showed only the numbers for the openings with enough observations in my sample. We see on this plot that sharp and rarer defenses (King's Indian, Dutch) are better to surprise and defeat a higher-rated player.
That's it for today! Not sure what I can conclude from all that, but it was fun to dig in the data and confirm some intuitions.
Until next time, happy learning.