
Who Plays What: a Statsitical Analysis of Chess.com Data (part 3)
Hi all,
I am taking advantage of some vacation time to be a little bit more active, so I am back today with the third (and probably last) post following my little statistical experiment.
If you didn't read the first or second installment, I used the chess.com API to download 3.4 million rapid games from 1.6 million unique players. I then did some simple statistical analysis. And with this final post, I wanted to come back on some of my results, namely the popularity of openings at different levels.
The (old) results
I already showed the graphs below in my first posts, but just to put everyone on the same level, I'll post them again below.
On the horizontal axis, I show the average elo of the two players in the game. On the vertical axis, I show the popularity in percentage for each opening. So if the line "e4" has 0.62 frequency for an elo of 1750, it means that, in my sample, 62% of the games with an average elo of 1750 were played with 1. e4.
Here are the results for the first move:
Now for the second move after 1.e4:
And finally, for the first move after 1. d4:
Note that for the answers to d4, the probabilities don't add up to 1, I must have missed some eco codes, but the main results remain robust enough for our purpose.
The patterns
We need to remember that these graphs are only an approximation, especially for very high and low elo. Even with a sample of 3.6 million, only 0.1% of the players in rapid are above 2100, which means that (roughly) 3.6 thousand games in my sample are above 2100 across all openings.
Nonetheless, we can see some clear and interesting patterns. The obvious one is the popularity of e4 across all levels, but the ones I want to discuss today are the trends across elo. More specifically, I want to focus on those openings that become more and more popular with a high rating.
The trend of the Sicilian is, of course, the most impressive, but it's not alone; look at the KID or the Benoni! Online and among amateurs, some openings are clearly played more often at high levels, which leads me to my questions.
The chicken or the egg?
Looking at these graphs, we can't say anything about causality, i.e., we can't say if these openings are played more often because they help players play better or if better players prefer them for any other reasons.
Then again, above 2150, more than 50% of the games after 1.e4 becomes a Sicilian, for only 10% below 1000. And the trend is very smooth. This either means that people who play the Sicilian improve faster or that players who improve tend to switch to Sicilian at some point.
I guess it's the latter, but even if I am right, this raises more questions than it answers. Indeed, if players who improve change openings, why do they do so?
I tried to think of a few possible reasons to learn a whole new opening or set of openings:
- To find a more aggressive opening.
- To find a more quiet/positional opening.
- Just for the pleasure of changing
- To learn to be comfortable in other styles (i.e., you are a positional player who wants to be better at sharp positions or vise-versa)
- To leave behind something unsound like a dubious gambit.
- You realized you don't understand your opening, so learning a whole new one or actually learning yours correctly would take the same time.
- To try a few things before fully committing.
- Somebody told me to.
And I am pretty curious to know which of these reasons do lead people to change their openings.
Sadly, my data can't help me answer this question, so I designed a little poll. If you ever changed your repertoire and are willing to spare a second, please select the reason why. And if I forgot to include your motivation in the list, let me know in the comments.
Personally, I changed openings way too many times and almost always for the wrong reason.
Early on, I switched from the Caro to the French because someone assured me it was a much better choice. Much later, I changed back to the Caro, mostly because I realized I did not understand how to play the French strategic themes. Even now, when I encounter the french, I go for the exchange because I don't know how to play the rest...
Against d4, I started with the Slav, then switched to the dutch to spice things up... and went back to the Slav because I was too lazy to truly learn the dutch. Finally, I switched to the KID a few months ago because I wanted to get out of my comfort zone. Now I love it and can't imagine switching back.
And with white, I played the London system for almost three years before switching to e4. I made this choice for two reasons: 1) I wanted to vary my games a bit, and 2) I wanted to learn to play open positions. Indeed, my natural inclination is positional play, and I saw that whenever the game opened up, I lost... So I took the bull by the horn and forced myself to go for a more aggressive and open style.
Anyway, I hope you found my little statistical analysis interesting, and if you took the time to fill the poll, thank you very much!
Until next time, happy learning!