Chess Analytics: The Long and the Short of It

Chess Analytics: The Long and the Short of It

kurtgodden
kurtgodden
Apr 23, 2016, 5:05 PM |
3

It has been 6 years since I posted to this blog, and I thought my return should introduce something completely new and fascinating – Chess Analytics.  By that I mean the application of modern Data Science analysis to chess game data, and for this first post of what will be many I will briefly discuss the distribution of game lengths among chess players with Elo ratings of at least 2,000, i.e. experts and above.

But first I will give some background.  My Chess Analytics posts will be based on a collection of almost 1.7 million games I have downloaded from the KingBase web site. (http://www.kingbase-chess.net)   To prepare these games for analysis I preprocessed them with a utility written for the R statistical analysis language by Joshua Kunst.  This utility is in file 01_pgn_parser.R and available for code geeks at GitHub.  (https://github.com/jbkunst/chess-db)  I then read that preprocessed game data into R and, using Kunst’s rchess package, I wrote additional preprocessing code to compute and save several statistics for every game, one of which is the length of each game in half moves, i.e. all of white’s moves plus all of black’s moves. 

Here is a histogram that provides a visualization of these game lengths:

 

 Along the x-axis is the length of each game from the shortest with only 6 half moves, to the longest, which was a marathon consisting of 475 half moves! 

 The y-axis shows us how many games were played at each game length.  You can see that there is a lengthy ‘tail’ to the right, indicating that the really long games, say those with more than about 160 half moves, very rarely occurred.  Of these extremely long games, there were 28 that were only played once, ranging from one game of 331 half moves to that marathon monster with 475.  The most frequent game length was 81, which was played 31,649 times!  Notice how far above the madding crowd that highest blue dot is!  Don’t you wonder why that game length results in such an enormous number of games? 

 The green vertical line is the location of the mean, or average, game length of 80.8 half moves, while the red vertical line at 78 is the median, which tells us that half of the nearly 1.7 million games were less than 78 half moves and the other half were longer.  Similarly, the green horizontal line at 4,548 indicates the mean value of the blue dots along the y-axis, whereas the red horizontal at 330 is again the median value.

 An interesting oddity is that elevated blue dot at game length 120 that was played 12,694 times, whereas its neighboring lengths of 119 and 121 were only played 9,969 and 8,273 times, respectively.  Perhaps in a future post I will explore this 120 move over-achiever to see if we can understand why it stands out.  There is also a lesser, but still noticeable bump at 160 moves, played 2,116 times vs. lengths 159 and 161 at 1,782 and 1,470 games, respectively.

 

 Of passing interest, though I place no significance upon it, out of the 1,696,607 games there are no games whatsoever with the following lengths: 339, 346, 350, 352, 357, 362, 364, 367, 371, 374, 375, 377—386, 389, 390, 392—397, 399, 401, 403—406, 408—419, 421—425, 427—438, 440—453, 455, 457—474.

 It is of interest, however, that there were 98 nano-games that only had 3 white and 3 black moves.  Somehow, of those 98 games 15 actually had a winner, 7 by black and 8 by white. 

Year

White

Black

Result

White Elo

Black Elo

PGN

2013

Grandelius

Kurayan

0-1

2576

2398

1.b4 a5 2.f4 f5 3.e3 a4

2013

Yilmaz

Steindorsson

1-0

2531

2235

1.c4 Nf6 2.Nc3 e6 3.e4 c5

2011

Pridorozhni

Malakhov

0-1

2542

2714

1.f4 Nc6 2.b4 d5 3.a3 Bg4

2010

Radjabov

Nakamura

0-1

2744

2741

1.Nf3 f5 2.c4 Nf6 3.Nc3 d6

2012

Vernay

Riff

1-0

2441

2494

1.d4 Nf6 2.c4 e6 3.Qc2 Bb4+

2009

Svetlov

Sanzhaev

1-0

2328

2112

1.d4 Nf6 2.Nf3 e6 3.c4 b6

2000

Dinstuhl

Dautov

0-1

2412

2606

1.d4 Nf6 2.c4 e6 3.Nf3 b6

1999

Akhmetov

Sveshnikov

0-1

2438

2541

1.Nf3 d5 2.g3 Nf6 3.Bg2 c6

2013

Baraeva

Navrotescu

1-0

2211

2147

1.d4 Nf6 2.c4 g6 3.Nc3 Nd5

2010

Eljanov

Caruana

1-0

2742

2709

1.d4 Nf6 2.c4 c5 3.d5 b5

1997

Lyrberg

Akesson

0-1

2430

2520

1.d4 Nf6 2.Qd3 d5 3.Qxh7 Rxh7

2013

Babarykin

Paveliev

1-0

2306

2386

1.e4 d5 2.exd5 Nf6 3.d4 Nxd5

2013

Repkova

Frischmann

1-0

2374

2234

1.e4 c5 2.Nf3 e6 3.b3 a6

2015

Yaniuk

Stupak

0-1

2098

2568

1.e4 e6 2.d4 d5 3.Nc3 Bb4

2013

Rakhmanov

Minero Pineda

1-0

2595

2412

1.e4 e6 2.d4 d5 3.exd5 exd5

 

And here is the behemoth 475 half-mover between Felber and Lapshun in 1998:

In my next post, (https://www.chess.com/blog/kurtgodden/chess-analytics-introduction-to-material-and-mobility) I will begin the first of several that will discuss this large game dataset with respect to black and white material vs. mobility, and I will show you some extraordinary graphs.  In the meantime, please let me know in the comments section if there is some analysis that you would like to see in a future post.