Tablebase Madness, Part I: How To Make a Tablebase
Nice table!

Tablebase Madness, Part I: How To Make a Tablebase


"Tablebase: A database of positions in which the outcome is perfectly known, and can guarantee perfect play. Currently tablebases cover all positions up to 7 pieces, counting both kings. Can require huge amounts of disk space to store." (from my Glossary)

Carlsen-Caruana, WCC game 3. Those with their eyes on Sesse saw a drawn endgame suddenly turn, for one ply, into a forced mate in 36 moves for black.

But here's the thing: even Sesse, with all its power, couldn't have calculated a +M36 in real-time. I've seen a 70-ply analysis from the starting position... but that took half a year to complete.

You're probably aware that Sesse was using tablebases to help it. Sesse even says so on its analysis page (emphasis mine): 

"Chess analysis by Stockfish 180419-asn 64 BMI2 (main analysis: 20x2.3GHz Haswell-EP, multi-PV search: 18x2.2GHz Broadwell-EP). Moves provided by Chess24. Hosting and additional analysis hardware by Studentersamfundet i Trondhjem and Berge Schwebs Bjørlo. JavaScript chessboard powered by chessboard.js and chess.js. Ding sound by Aiwha (CC-BY-3.0). 7-man Lomonosov tablebase lookup by ChessOK."

                       Sesse's Fine Print (Update your Stockfish, Sesse!)

Unfortunately, Sesse proved that the understanding of tablebases among the chess community was less than ideal. How are tablebases made? What is a Lomonosov tablebase? Where can I get my own tablebase? And what does "7-man" mean? In this series I'll attempt to clear up some of the misunderstanding around tablebases.

First question: How did we get these things called tablebases? By retrograde analysis and computation. Lots and lots of computation.

Let's say we want to generate a tablebase for all KRvK positions. We first generate, by permutation, every position that can be created with just those three pieces. Both of the following positions are generated, but the second position is illegal and so gets discarded.

You might have guessed that, even with just the three pieces, there are a lot of legal positions. Since the chessboard has a high degree of rotational symmetry when pawns are excluded, we can cut down on the number of positions we actually need to calculate. The next two positions are functionally identical:

Even with our tricks, we end up with something like 40,000 legal positions. Which is great! The next step is to decide which of those legal positions is actually checkmate. Luckily for us there are much fewer of these and they've already been generated. We mark each of those positions with a '0', signifying that white wins in 0 moves.

White must have made the last move in each '0' position, so we then move backwards to all the positions white could have had 1 ply earlier. These are all of white's legal moves from the '0' position, excluding positions leaving black in check (which would be impossible), pawn moves, and captures. Examples below:

Since white can give checkmate on his next move, we mark all those positions with '1'!

Now it must have been black's move. His king could have been on c8 or e8, but in only one of those cases is he forced to go to d8 (and be checkmated):

So the position with Ke8 gets labelled with a '2', meaning it is a forced checkmate in 2 ply. The position with Kc8 is ignored. On and on we go like this, labeling position after position that lead to checkmate. These next three positions get assigned a '3':

The specific tree we are analyzing- in which we end up in a specific '0' position- ends here. Wherever black's king came from, he cannot have been forced into a '3' position in this tree. He always has another square to go to (this may or may not be obvious). That doesn't mean he won't be checkmated eventually; it just means that the final position will be different.

We do the same process for every position on our checkmate list, stopping when we hit '100' (for the 50-move rule), or reach a position that has already been marked (in which case we use whichever marking is lower). Eventually, after starting from every checkmate position, we  reach a situation where each position in our large, legal-position set is either marked 0-100 or unmarked. The marked positions are wins for white with forced mates indicated by their marking number. The unmarked positions are draws because mate cannot be forced or is prevented by the 50-move rule in a cursed win.

Obviously, most KRvK positions are wins (95% are, in fact). We never even use labels greater than 64, since the KRvK longest mate is 32 moves. The only non-wins are stalemates and positions where black can immediately capture white's rook:

I called this a retrograde analysis, and hopefully it's clear why: instead of working forward to try to give checkmate, we started with checkmate and worked backwards to every other position. With KRvK this is a fairly simple process, and doesn't take that much computing power. But things get exponentially more difficult when we add extra pieces and pawns into the mix. The next two positions aren't identical, so they have different tablebase entries:

With more than three pieces (or a pawn), we can't even stop marking positions at 100; the 50-move rule could reset in the sequence. If a capture is made, we can simply check the resulting position on a smaller tablebase. But if a pawn push is made, we might have more than 100 ply to finish the game. Black is mated in 256 ply:

That's all for now! The next part in this series will cover types of tablebases and the practical challenges involved in generating, storing, and accessing them. Stay tuned!