# Chess Analytics: The Long and the Short of It

Apr 23, 2016, 5:05 PM |
2

It has been 6 years since I posted to this blog, and I thought my return should introduce something completely new and fascinating – Chess Analytics.  By that I mean the application of modern Data Science analysis to chess game data, and for this first post of what will be many I will briefly discuss the distribution of game lengths among chess players with Elo ratings of at least 2,000, i.e. experts and above.

But first I will give some background.  My Chess Analytics posts will be based on a collection of almost 1.7 million games I have downloaded from the KingBase web site. (http://www.kingbase-chess.net)   To prepare these games for analysis I preprocessed them with a utility written for the R statistical analysis language by Joshua Kunst.  This utility is in file 01_pgn_parser.R and available for code geeks at GitHub.  (https://github.com/jbkunst/chess-db)  I then read that preprocessed game data into R and, using Kunst’s rchess package, I wrote additional preprocessing code to compute and save several statistics for every game, one of which is the length of each game in half moves, i.e. all of white’s moves plus all of black’s moves.

Here is a histogram that provides a visualization of these game lengths:

Along the x-axis is the length of each game from the shortest with only 6 half moves, to the longest, which was a marathon consisting of 475 half moves!

The y-axis shows us how many games were played at each game length.  You can see that there is a lengthy ‘tail’ to the right, indicating that the really long games, say those with more than about 160 half moves, very rarely occurred.  Of these extremely long games, there were 28 that were only played once, ranging from one game of 331 half moves to that marathon monster with 475.  The most frequent game length was 81, which was played 31,649 times!  Notice how far above the madding crowd that highest blue dot is!  Don’t you wonder why that game length results in such an enormous number of games?

The green vertical line is the location of the mean, or average, game length of 80.8 half moves, while the red vertical line at 78 is the median, which tells us that half of the nearly 1.7 million games were less than 78 half moves and the other half were longer.  Similarly, the green horizontal line at 4,548 indicates the mean value of the blue dots along the y-axis, whereas the red horizontal at 330 is again the median value.

An interesting oddity is that elevated blue dot at game length 120 that was played 12,694 times, whereas its neighboring lengths of 119 and 121 were only played 9,969 and 8,273 times, respectively.  Perhaps in a future post I will explore this 120 move over-achiever to see if we can understand why it stands out.  There is also a lesser, but still noticeable bump at 160 moves, played 2,116 times vs. lengths 159 and 161 at 1,782 and 1,470 games, respectively.

Of passing interest, though I place no significance upon it, out of the 1,696,607 games there are no games whatsoever with the following lengths: 339, 346, 350, 352, 357, 362, 364, 367, 371, 374, 375, 377—386, 389, 390, 392—397, 399, 401, 403—406, 408—419, 421—425, 427—438, 440—453, 455, 457—474.

It is of interest, however, that there were 98 nano-games that only had 3 white and 3 black moves.  Somehow, of those 98 games 15 actually had a winner, 7 by black and 8 by white.

 Year White Black Result White Elo Black Elo PGN 2013 Grandelius Kurayan 0-1 2576 2398 1.b4 a5 2.f4 f5 3.e3 a4 2013 Yilmaz Steindorsson 1-0 2531 2235 1.c4 Nf6 2.Nc3 e6 3.e4 c5 2011 Pridorozhni Malakhov 0-1 2542 2714 1.f4 Nc6 2.b4 d5 3.a3 Bg4 2010 Radjabov Nakamura 0-1 2744 2741 1.Nf3 f5 2.c4 Nf6 3.Nc3 d6 2012 Vernay Riff 1-0 2441 2494 1.d4 Nf6 2.c4 e6 3.Qc2 Bb4+ 2009 Svetlov Sanzhaev 1-0 2328 2112 1.d4 Nf6 2.Nf3 e6 3.c4 b6 2000 Dinstuhl Dautov 0-1 2412 2606 1.d4 Nf6 2.c4 e6 3.Nf3 b6 1999 Akhmetov Sveshnikov 0-1 2438 2541 1.Nf3 d5 2.g3 Nf6 3.Bg2 c6 2013 Baraeva Navrotescu 1-0 2211 2147 1.d4 Nf6 2.c4 g6 3.Nc3 Nd5 2010 Eljanov Caruana 1-0 2742 2709 1.d4 Nf6 2.c4 c5 3.d5 b5 1997 Lyrberg Akesson 0-1 2430 2520 1.d4 Nf6 2.Qd3 d5 3.Qxh7 Rxh7 2013 Babarykin Paveliev 1-0 2306 2386 1.e4 d5 2.exd5 Nf6 3.d4 Nxd5 2013 Repkova Frischmann 1-0 2374 2234 1.e4 c5 2.Nf3 e6 3.b3 a6 2015 Yaniuk Stupak 0-1 2098 2568 1.e4 e6 2.d4 d5 3.Nc3 Bb4 2013 Rakhmanov Minero Pineda 1-0 2595 2412 1.e4 e6 2.d4 d5 3.exd5 exd5

And here is the behemoth 475 half-mover between Felber and Lapshun in 1998:

In my next post, (https://www.chess.com/blog/kurtgodden/chess-analytics-introduction-to-material-and-mobility) I will begin the first of several that will discuss this large game dataset with respect to black and white material vs. mobility, and I will show you some extraordinary graphs.  In the meantime, please let me know in the comments section if there is some analysis that you would like to see in a future post.

Blogs