Think You Know Algebraic Notation?
[Note added on Jan 15, 2007: Max Wootton (mxdplay4 here on chess.com)
has discovered some errors in my calculations. We corresponded on this topic, but never quite determined the true result. If I feel ambitious, I will resume the work to satisfy my curiosity. -Kurt ]
For a time during the past year I was doing some data mining on a large collection of chess games, using some software to look for interesting patterns of moves. The first thing that surprised me was the large number of distinct symbols used to represent moves in standard algebraic notation. There were over 3,000.
In a recent blog entry I calculated the number of moves in The Longest Possible Chess Game, and so I decided to try and calculate the number of distinct algebraic symbols that could be used to notate a game. By a symbol, I mean a string of characters used to record a single move, e.g. ‘Nf3’. That is one symbol and is distinct from ‘Ngf3’, which is another symbol used when ‘Nf3’ would be ambiguous. So how many distinct symbols are there?
I will give you the answer to this question below, and the number might surprise you. It certainly did me. But before we arrive at that punch line, let me supply you with a few basic facts as well as the outline of how I performed the calculations.
There are 64 squares on the board, which form the targets of most move symbols. And there are 5 piece designators, ‘R’, ‘N’, ‘B’, ‘Q’ and ‘K’, to designate the rooks, knights, bishops, queens and kings, the pawns being represented by the absence of a piece designator. So as a first, gross approximation we might think of the 64 basic pawn moves such as ‘e4’ and so forth, to which we add the 5 pieces moving to any square for another 5 * 64 = 320 symbols, resulting in 64 + 320 = 384 basic move symbols like ‘Qe2’ or ‘Rf7’.
This, however, is a gargantuan underestimate of the number of legal move notations.
The algebraic system is a language, and thus, we first need to establish the base vocabulary of this language. For this, we turn to Appendix E “Algebraic notation” from FIDÉ, the International Federation of Chess, which outlines the use of the following core vocabulary. In the ensuing discussion, I restrict myself to representation in the English speaking countries, some other countries and language groups using other symbols.
I have already referred to some of this core vocabulary, e.g. the letters that indicate the pieces, such as ‘Q’. And I have indirectly referred to the names of the files and ranks, the lower case letters ‘a’ through ‘e’ and the Arabic numerals ‘1’ through ‘8’. To this we must add the zero, the hyphen, the period, and the lower case ‘p’ in order to accommodate expressions such as ‘0-0’ and ‘exd6 e.p.’ Notice that we also have a blank after the ‘6’ in that symbol, which is itself a character. Thus, we also have the blank character in our base vocabulary.
We also need ‘x’ for captures and ‘+’ for checks. In the case of mates, FIDÉ permits either ‘++’ or ‘#’. I will therefore add ‘#’ as a member of our base vocabulary, and we will use it, rather than the former, to compute our answer. Interestingly, while FIDÉ refers to use of ‘=’ to represent draws, they make no mention of how to notate wins, so I will choose to ignore all game outcomes in the remainder of this discussion. Although some use the ‘=’ character when a pawn is promoted, the FIDÉ notation for pawn promotion is to append the piece letter immediately after the move, e.g. ‘e8Q’.
Thus, for our base vocabulary we recognize the use of these 29 atomic symbols:
The 9 letters ‘a’ through ‘h’, and also ‘p’.
The 5 letters ‘K’, ‘Q’, ‘R’, ‘B’, and ‘N’.
The 9 numerals ‘0’ through ‘8’.
The 6 additional characters '-', ‘x’, ‘.’, ‘+’, ‘#’, and the blank character, ' '.
These 29 atomic symbols can be combined in various ways to represent moves. The longest string of atomic symbols that can be formed into a legal move is 10, e.g. ‘exd6+ e.p.’. The lower limit is 2 (‘e4’), and there are also 3, 4, 5, 6, 7, and 9 character symbols. There are no symbols with 8 characters.
Given these facts, then as an upper limit from a vocabulary of 29 symbols, there are
(29^2) + (29^3) + (29^4) + (29^5) + (29^6) + (29^7) + (29^9) + (29^10)
= 841+ 24,389 + 707,281 + 20,511,149 + 594,823,321 + 17,249,876,309 + 14,507,145,975,869 + 420,707,233,300,201
= 435,232,245,219,360 potential move symbols. However, the vast majority of these are ungrammatical, e.g. ‘012.+e’.
While it is only somewhat more tedious to determine the complete set of grammatical move symbols than it is to read about the computation, it can be done. And I have done so for your edification.
So if you’d like to become edified, continue reading. If you simply prefer to know the answer to impress your friends at the local chess club or win beers at the bar, you may skip to the bottom.
I have 10 full pages of calculations, which is far too much to include here, but I can give you an idea of the type of activity that was involved. I attacked the problem top-down, much like a good programmer, by breaking it up into smaller sub-calculations which were then summed to arrive at the final result. The simplest of these sub-problems was, as you might suspect, the enumeration of castling. But, as you might not realize at first, is that you need more than the two symbols ‘0-0’and ‘0-0-0’. Consider that, in the act of castling, the moved rook may deliver check or mate. Thus, instead of the two symbols we have 6 via the wonders of multiplication: 0-0, 0-0+, 0-0#, 0-0-0, 0-0-0+, 0-0-0#.
Let us consider something a step up in complication. Since a bishop can move to any of the 64 squares, we add the 64 symbols ‘Ba1’ … ‘Bh8’, but we are only getting started. Each of those 64 moves may involve a capture, giving us another 64 ‘Bxa1’ … ‘Bxh8’. And again all of these preceding moves may deliver check or even mate (if only through discovery). This means that we have 6 groups of the 64 basic Bishop moves, or 6 * 64 = 384 bishop notations. Here is a representative member from each of these 6 groups: Bc4, Bxc4, Bc4+, Bc4#, Bxc4+, Bxc4#.
If you are surprised by the large number of basic bishop notations, gird your loins for further surprise because we are not yet done, even with the bishops. Consider that pawns may promote to pieces, including bishops. This means that you may have a game where there are multiple bishops of the same color squares for either side, requiring additional notation for disambiguation. Thus, if you have two bishops on the dark squares at d2 and f2, they are both attacking the e1 square and we now require two new notations to disambiguate, ‘Bde1’ and ‘Bfe1’. Such disambiguation notations add a tremendous number of new symbols to our language.
But before we continue, let us show a table of move notations that does not include disambiguation, even for rooks or knights which require it even before the possibility of pawn promotions.
|Category|| Number of Symbols
|Pawns||984||e4, exd6 e.p., exf8Q#|
|Kings||384||Ke2# (A discovered mate.)|
Table 1: Basic Move Notations, Not Including Disambiguation for Knights or Rooks
The large number of possible pawn notations is of greatest interest here. You begin to realize how this number can be so large when you consider that pawns can move straight (e4), with capture (exf5), en passant (exd6 e.p.), straight with promotion (c8Q), capture with promotion (dxc8Q), and all of these with check or mate.
Compare Table 1 with the numbers in Table 2, which include the necessary disambiguation notations required for knights and rooks, even before additional disambiguation is required due to pawn promotions.
|Category||Number of Symbols||Examples|
|Rooks||6,528||Rae1 , R2d6|
Table 2: Basic Move Notations, Not Including Disambiguation from Pawn Promotions
Are you amazed at how many additional symbols are added for disambiguation of knights and rooks? Let us look at what FIDÉ says regarding disambiguation.
"If two identical pieces can move to the same square, the piece that is moved is indicated as follows:
1. If both pieces are on the same rank: by (a) the first letter of the name of the piece, (b) the file of the square of departure, and (c) the square of arrival.
2. If both pieces are on the same file: by (a) the first letter of the name of the piece, (b) the rank of the square of departure, and (c) the square of arrival.
3. If the pieces are on different ranks and files, method (1) is preferred."
Note that clause (3) does not require the use of method (1), only that it is preferred. Thus, for my calculations I have chosen to permit notations such as R1a1 in addition Rea1, and this greatly magnifies the number of possibilities. If you disagree with this interpretation, you can take the matter up with the committee responsible for such rules at FIDÉ, or with me. You will have equal influence in either case.
In any case, for Rooks we can disambiguate a move to any of the 64 squares either through the 8 files or the 8 ranks, i.e. with 64 * 16 = 1,024 notations. And since each of these notations can involve a capture, a check, a mate, a capture with check, or a capture with mate, there are 1,024 * 6 = 6,144 new notations. When these are added to the first 384 notations not involving disambiguation, we arrive at the final 6,528 figure for Rooks. A similar process can be used to determine that there are 4,800 knight moves, including disambiguation.
As Table 2 indicates, we are already at a total of 13,470 symbols that can be used to notate a chess game. If we can determine the additional symbols needed for the required disambiguation due to pawn promotions, we will be done.
The additional number is again astonishingly large.
These final calculations required the better part of an otherwise free Saturday for me, and involved much more than figuring out a few basic moves and then multiplying. The complications arise from the fact that some notations are impossible, e.g. Nha1. While this one may be easy to exclude, it requires patience, perseverance and much coffee to discover all of them and perform all of the required calculations. The board itself must be partitioned into sections, with the corner squares treated as one group, the “2nd corner” squares (b2, b7, g2, and g7) treated as another group, and other groups of squares forming yet other equivalence classes. Compute the moves to one representative square of each class for a given piece, and you can multiply by the number of squares in that class to arrive at an intermediate number for the running total. My younger daughter, Stephanie, referred to me numerous times as a dork as I sat for hours on end slaving over such minutiae, and informing her periodically of my ever-increasing intermediate subtotals. (But this is the same daughter who has memorized pi to 2,300 digits, and ‘dork’ is used between us as a term of endearment.)
Table 3 presents these grand total results. The simplicity of the table belies the extreme tedium of effort to produce it.
|Category||Number of Symbols||Examples
|Pawns||984||e4, exd6 e.p., exf8Q#|
|Rooks||6,528||Rae1 , R2d6|
|Queens||9,360||Qe4xg6+ (With other Queens at e6 and g4.)|
|Kings||384||Ke2# (A discovered mate.)|
Table 3: All Possible Algebraic Move Notations
While this knowledge may not win you very many games, it nonetheless adds to your increasing body of chess learning. I must add one caveat, however. It is possible that I have made an error in these calculations, which would be unfortunate indeed. If any brave and dorkful soul wishes to check my work, send me an email to firstname.lastname@example.org and ask for the 10 page document that details my calculations. If you find and fix any errors, I will give you proper credit in a future blog. But as you verify my calculations, be careful. The complications are many and you must choose your move carefully, in chess as in life.