Chess Analytics: Introduction to Material and Mobility
In my previous post (click here) I discussed the distribution of the lengths of almost 1.7 million expert-level and higher games. This post will be the first of several to discuss the concepts of material and mobility, as well as their interplay; and I can promise you some fascinating insights as we delve deeper. But let me first give you some background.
In the 1,696,607 analyzed games, there are 138,761,123 individual moves (black + white). At each one of the resulting positions, I computed the material for both black and white. By material I mean simply the sum of the piece values for each player, and I used the ‘standard’ values taught to beginners: pawns are worth 1 point each, knights and bishops 3 points, rooks 5, and queens 9. I ignored the infinite values of the two majestic kings since they are always present on the board and thus cancel each other out. Thus, in the opening position both players have material values of 39 and as the game progresses these values generally go down, but also up as pawns are promoted.
This first graph is a histogram that shows the distributions of material for all 138,761,123 post-move board positions (plus the opening position for every game). Along the x-axis are the material values, chunked into equally sized groups automatically by the R code I used to create the graphs. The y-axis, labeled “Frequency” gives you the number of positions that fall into each chunk. Since the numbers are large, they are represented in scientific notation. So the 2.5e+07 on the top left of the histogram for black refers to the number 25,000,000. The red vertical line is the mean (average) material value for each player over all board positions, 27.476888 for black and 27.50087 for white. The minimum material for both sides is 0, i.e. king-only end games for one side, and the maximum that occurs among all the games is 46 points for black and 47 for white.
The next statistic is mobility, which is the number of possible moves for each player when it is their turn. As with material, I computed the mobility for every one of the 138,761,123 moves. In the game data, there are 69,770,248 actual white moves and 68,990,875 actual black moves. (Think about game endings, and you will understand why there are a lot more white moves than black.) But during these games, the number of possible moves is significantly greater than the actual ones, and these possible moves range from 0 to 77 for white in this dataset while black ranges from 0 to 79.
The next graph is also a histogram but with mobility along the x-axis, and once again the vertical red lines show the mean values over all of each player’s possible moves.
Black’s mean mobility is 29.97390 while white’s is 32.02944, regardless of the game outcomes. Notice that white has the advantage in both material and mobility, although it would appear on the surface that the material advantage white enjoys is miniscule, while the mobility advantage is ‘significantly’ larger.
But statistics can be as surprising as chess, and I have tested the statistical significance of both white’s advantages. White’s ‘miniscule’ material advantage of only 0.02399 – less than 3 hundredths of a pawn – turns out to be extremely significant. For stat geeks the p-value of the difference in a t-test is < 2.23-16!
As for mobility, white’s advantage is also extremely significant. The p-value here is also < 2.23-16, but the t-statistic is the extraordinary -1108.9, compared to the t-statistic for material of -18.354.
It is well-known that white enjoys an advantage in having the first move, but these results quantify that advantage, strongly suggesting that mobility is by far the stronger advantage than material. I will return to this idea in future blog posts.
If you think carefully, you might speculate that there may be a positive relationship between material and mobility. That is, since mobility is the sum of the possible moves of each player’s pieces, one might think that as material increases so does mobility, and as one loses pieces during the course of a game, so too does that player lose mobility. But is this true? It’s easy enough to find out by plotting a graph with material on one axis and mobility on the other, so let’s look at this relationship for black (the corresponding graph for white is not substantially different):
Indeed you can clearly see that there is such a positive relationship between material and mobility. As one goes up or down, the other tends to do the same. In this graph each blue dot shows the mean (average) material and mean mobility for black, one dot for each of the 1,696,607 games.
The red line represents what is called a least-squares linear regression of this data. This particular straight line is the visual representation of the following equation that predicts the value of black’s mobility from black’s material:
BlackMobility = (0.4999 * BlackMaterial) + 16.11
The line is such that the sum of the vertical distances from each blue dot to the red regression line is minimized over all other straight lines that could be drawn on this graph, which is, in fact, infinite in number.
However, you can also see that a straight line is a deficient summary of the data because on the left side the blue dots are all below the line. Further, there is a slight curvature to the collection of blue dots that is missed by a straight line. Thus, we can modify the equation that we are attempting to fit to this data in a couple of ways to obtain a better regression line.
First, we add an additional term to the equation, the square of the material value, in order to capture the curvature of the raw data. Second, if you look again at the graph, it is clear that the further to the right that we go, the more variation there is in the blue dot distances from the red line. This increasing variance is referred to as heteroscedasticity, which is a cool term to memorize and impress your parents and friends, not to mention its potential value in winning bar bets. And while it’s a fun word, it’s not regarded as a good thing when trying to model data. One of the ways to shrink that variance more on the right than on the left is to take the logarithm of the y-values, the mobility, which reduces larger numbers more than smaller numbers. These two modifications lead to the following equation, which fits the actual data better than the linear regression above (the notation “BlackMaterial ^2” means the square of black’s material):
log(BlackMobility) = (0.0885 * BlackMaterial) – (0.001262 * BlackMaterial^2) + 1.947
The corresponding plot follows, and you can see that the curvature of the new red regression line gives a better fit to the raw data. Also, if you look carefully you will notice that there is a bit less variance on the right hand side of the graph, which is due to the changed scale of the y-axis that now uses the log of the mobility values:
For the stat nerds in the crowd, the statistical significance of all these coefficients, as well as the F-statistic, is less than 2e-16. For you non-stat-nerds (that just means that you are nerdy about things other than statistics), the numbers in the equation are very significant. The adjusted R-squared value is 0.513, which isn’t bad, considering that we are modeling only material’s influence on mobility, whereas there are lots of other considerations that influence mobility, e.g. pawn structure.
In my next post (click here), I will begin to discuss the difference between black and white material and mobility and how we can use these concepts to actually predict the outcomes of games (win-lose-draw) based only on the numeric values of material and mobility.
Please let me know in the comments section if you would like to see other analyses of game data, and I may (or may not) try to accommodate your requests. (It’s extremely time consuming, and computing the material and mobility values for every move actually took several weeks of computing time.) In the meantime, choose your move carefully, in life as in chess.