How many different chess positions are there?

Sort:
watcha

To establish the number of legal chess positions is a very tough problem due to the large number of possible positions and the difficulty of verifying algorithmically the legality of random positions.

I think I have said everything that my limited capability allowed me to say about this problem.

However I want to make one last comment with respect to a small subset of the problem which can be addressed on a more solid basis.

For 7 men endgames we have the full solution in the form of the Lomonosov 7 men endgame tablebase. I could not get a handle on how many positions this tablebase contains, the best I could find ( in the referenced Wikipedia article ) is that it is of the size 100 TB ( 100 terabyte = 100 * 10^12 byte ). However if we assume that a position can be stored on cca. 150 bits we can get a rough estimate of the number of positions in this tablebase:

( 100 TB * 8 bits / 150 bits ) which is 5.33 * 10^12.

It is also possible to estimate the number of random positions if there are only 7 pieces on the board:

which gives 7.88 * 10^14 ( this formula assumes that all non king pieces are different which is not true in general, just think of an all pawn endgame ).

Both of these estimates are rough but it is apparent that they are in the same ballpark. The number of legal positions is only two orders of magnitude less than the number of random positions.

Ghostqiuyu

I....I..I don't know

The_Ghostess_Lola

Love your name Ghostqiuyu !....Smile....

Let me take in all you've said watcha on #163. Keeping up with you is....well, let's just say I'm a few steps behind you.

TheGrobe

Remember that a tablebase is a graph, not just an enumerated list of positions (the nodes) but also the legal transitions between them (the edges), so the figure above would be the absolute upper bound.

watcha
TheGrobe wrote:

Remember that a tablebase is a graph, not just an enumerated list of positions (the nodes) but also the legal transitions between them (the edges), so the figure above would be the absolute upper bound.

I don't know the actual implementation of the Lomonosov tablebase, so I'm only guessing here. But it general I can say ( having written a program which is capable of generating a tablebase myself ) that you only need to store the evaluation of positions rather than moves. Why? Because you can always generate the legal moves in a position on the fly ( with bitboard representation this can be done extremely fast as today's engines prove it - they generate positions on the order of some meganodes / sec which assumes very quick move generation ). Now all the legal moves ( the edges of the graph as you call them ) lead to a 'node' ( position ) that - having a full solution - must be also in the tablebase. So what you do is enumerate all the legal moves on the fly and then look up their evaluation in your tablebase ( by making the moves one by one and looking up the evaluation of the resulting positions ). The evaluation of a position is a single signed integer: the position is either a mate in +/- N ( sign according to which is side is being mated ) or a draw ( 0 ).

USArmyParatrooper

On this note, it's quite possible black is already in a mating net after 1. e4, with a very very very large number of possible continuations that all lead to checkmate 1-0

ebillgo

The usual guesstimate of 10^120 possible games,based on 30 choices for each of the average 40 moves,  is not very realistic: In most stages of the game, there are simply not so many choices. You have to attend to some immdiate threats and take care of harmonious development all the time. Sensible choices available are more often than not  a single-digit number. 10^120 therefore doesn't agree well with real game situations.

watcha

Semi-Off ( optimality of storing positions, hashing ):

This with add little to the number of legal positions, it is just a non trivial sideline regarding the practical implementation of a tablebase ( or opening book for that matter ) so those looking for the number please press PgDn at this point.

What I said about the Lomonosov tablebase ( 150 bits per position - the order of bits needed to store a general position ) actually puts a lower bound on the number of stored positions rather than an upper bound. First of all if you have only seven pieces you can store their place in exactly 6 bits ( for 64 squares ) and 4 bits is more than enough to store the piece ( on 4 bits you can store 16 possibilities but you need only 12 of them ). This means that you can store a 7 men position comfortably on 7 * ( 6 + 4 ) = 70 bits. But this is not the only and main source of optimalization. In programming storing something in array means that you can always directly know where an element of the array of certain index will be in the memory: you only have to multiply the size on which you store an element with the index of the element and you have the location where it is stored. Using 150 ( or 70 ) bits per position assumes that you are storing positions in an array. You can look up a position in an array by looking at all positions one by one and comparing to the position you are searching for. Now you have the index of the position and you can look up the evaluation of it in the array of evaluations under the same index. However for looking up positions in the order of 10^12 you have to make 10^12 / 2 comparisons on average - this is extremely time consuming. What if just based on the representation of the position you could somehow find out the index under which its evaluation is stored so that you don't need to make all those comparisons and so that you don't have to store the representation of the position at all? This is what are hashes for. If you know that you need to store 10 ^ 12 positions you can generate a digest ( a signature ) of the position in the range  0 - 10^12 and use this as a hash key. This produces a deterministic ( meaning that the same position will always have the same signature ) but essentially random number ( meaning that similar positions will not have similar digests ) based on your 150 ( or 70 ) bit representation. It is not guaranteed that different positions will have different signatures ( if they have the same signature this is called a collision ) but the better the hashing algorythm is the better are the chances for that. If some positions have the same signature then you have to store their representation in an array and go one by one to look up your particular position ( however this array will only have a few elements and where this array will be in memory is known exactly ). With a good hashing algorythm however there will be few collisions and you have to store the actual representation of positions only in a small fraction of cases. In most of the cases the index will be unique and you can look up the evaluation directly in the array of evaluations. ( I used xxhash for this purpose which is a very effective hashing algorythm working at ram speed. ) In the ideal case it is enough to store one integer per position ( say on 16 bits ) - the evaluation. The array of evaluations will store this integer for positions with no collision and a pointer to a small array of colliding positions where there is a collision.

waffllemaster
watcha wrote:

on 4 bits you can store 16 possibilities but you need only 12 of them

Out of curiosity, would it be true you actually need 13 of them?  6 pieces for each player is 12, but then 1 more for an empty square right?

The_Ghostess_Lola

Wouldn't it be mind-blowing if the answer was exactly 5 Avagardo's Constant's (or 30.11x10^115 moves) ? Ok, I'll stop being silly now....

watcha
waffllemaster wrote:
watcha wrote:

on 4 bits you can store 16 possibilities but you need only 12 of them

Out of curiosity, would it be true you actually need 13 of them?  6 pieces for each player is 12, but then 1 more for an empty square right?

In this particular case not since you store the squares of the pieces directly on 6 bits. All other squares are assumed to be empty by default ( when there are only a few pieces this is more economical, storing empty squares on even a single bit would make up for 57 wasted bits in the 7 men case ). Storing the empty squares in some coded form comes into play when you store the position in a serialized way ( in a conventional array or by Huffman encoding which latter was brought up earlier in this thread ).

 


 

If I were to use Huffman codes, I would do it this way:

 

0 - empty

1c10 - pawn

1c11 - king

1c000 - knight

1c001 - bishop

1c010 - rook

1c011 - queen

 

where 'c' is the color of the piece, 1 for white, 0 for black. Pieces start with 1, 4 bit codes start with 1c1, 5 bit codes start with 1c0. There is no need for 6 bit codes.

macer75

Another interesting (and more easily answerable) question would be, what is the number of unique positions that have occurred in all games played on chess.com?

TheGrobe
watcha wrote:
TheGrobe wrote:

Remember that a tablebase is a graph, not just an enumerated list of positions (the nodes) but also the legal transitions between them (the edges), so the figure above would be the absolute upper bound.

I don't know the actual implementation of the Lomonosov tablebase, so I'm only guessing here. But it general I can say ( having written a program which is capable of generating a tablebase myself ) that you only need to store the evaluation of positions rather than moves. Why? Because you can always generate the legal moves in a position on the fly ( with bitboard representation this can be done extremely fast as today's engines prove it - they generate positions on the order of some meganodes / sec which assumes very quick move generation ). Now all the legal moves ( the edges of the graph as you call them ) lead to a 'node' ( position ) that - having a full solution - must be also in the tablebase. So what you do is enumerate all the legal moves on the fly and then look up their evaluation in your tablebase ( by making the moves one by one and looking up the evaluation of the resulting positions ). The evaluation of a position is a single signed integer: the position is either a mate in +/- N ( sign according to which is side is being mated ) or a draw ( 0 ).

Yes, a good point, but don't you need to indicate the evaluation that corresponds to each legal target position?  Barring that, at a minimum, surely you need to reference the target position that the optimal evaluation corresponds with?

waffllemaster
watcha wrote:
waffllemaster wrote:
watcha wrote:

on 4 bits you can store 16 possibilities but you need only 12 of them

Out of curiosity, would it be true you actually need 13 of them?  6 pieces for each player is 12, but then 1 more for an empty square right?

In this particular case not since you store the squares of the pieces directly on 6 bits. All other squares are assumed to be empty by default ( when there are only a few pieces this is more economical, storing empty squares on even a single bit would make up for 57 wasted bits in the 7 men case ). Storing the empty squares in some coded form comes into play when you store the position in a serialized way ( in a conventional array or by Huffman encoding which latter was brought up earlier in this thread ).

 


 

If I were to use Huffman codes, I would do it this way:

 

0 - empty

1c10 - pawn

1c11 - king

1c000 - knight

1c001 - bishop

1c010 - rook

1c011 - queen

 

where 'c' is the color of the piece, 1 for white, 0 for black. Pieces start with 1, 4 bit codes start with 1c1, 5 bit codes start with 1c0. There is no need for 6 bit codes.

Ok, thanks.

The_Ghostess_Lola

M75....I think you're onto something. Using chess.com as a practical database could be a brilliant idea for iterative purposes....

(but I still wanna see that exact number (and proven)....it's out there - we just have to go get it somehow !....Smile....)

VULPES_VULPES

tons

TheGrobe

Finally, a well thought through answer.

eltodesukane

OEIS (On-Line Encyclopedia of Integer Sequences)
A083276
Number of distinct chess positions after n plies including differences due to availability and possibility of castling and en passant captures.
1, 20, 400, 5362, 72078, 822518, 9417681, 96400068, 988187354, 9183421888, 85375278064, ...

http://oeis.org/search?q=chess+position&language=english&go=Search

watcha
TheGrobe wrote:

Yes, a good point, but don't you need to indicate the evaluation that corresponds to each legal target position?  Barring that, at a minimum, surely you need to reference the target position that the optimal evaluation corresponds with?

Honestly: I don't really understand this question.

1) Given a position you generate all legal moves ( on the fly ).

2) You make the first move, look up the evaluation of the resulting position, write it down beside the move, then reset the position.

3) You make the second move, look up the evaluation of the resulting position, write it down beside the move, then reset the position.

---

4) Keep doing this with all the moves.
 
Now you have a list of legal moves with values. You can order the list in descending order according to the values. The move with the highest value will be on the top of the list - this is the optimal move.
 
If you mean the single evaluation of a position, it is always the optimal one. If there is a 'mate in 5' move and a 'mate in 3' move in a position for white and all other moves are draw or losing ( white being mated ) then the single evaluation will be 'mate in 3' for white.
TheGrobe

OK, I've got you.  That makes sense.