Creating a chess engine (part 3: Opening Book)
Ok, I have not done much work on the evaluation code yet, so that will wait for another part. This part will instead focus how an engine can understand opening theory and how it chooses which openings to play. Several issues arise when creating an opening book (it's basically a database):
1. What lines will you have in the opening book and which games are they selected from?
2. How do you choose which lines from the book to play - you do not want your engine to play crap openings all the time or become too predictable by always playing the same variation in the sicilian dragon.
3. How do your organize your opening book so its easy to find the next move for the computer to play?
1. What lines will be in your opening book
This really goes to the heart of what exactly is opening "theory" - what variations do we accept as established theory in chess openings? Well, there is no correct answer really. Most people regard the opening theory as the openings and variations defined as the lines and moves played by very strong players at GM level. You would probably not trust the analysis of opening variations done by a 800 ELO player. You would have more faith in variations analyzed by GMs. Indeed most opening books for computers are created from a large sample of games by 2400+ players. Why not just choose 2700+ players? Well, then you would not have any lines in your book covering openings such as b4, b3 and f4 and some people will play those openings against your computer, so you need to have them in the book. Therefore you cannot simply just compile the opening book from GM games - you need to have a broad book to cover all openings. Right now I have compiled an opening book with about 6000 different variations and 250.000 moves (plies) in total. It is a fairly decent sized opening book - it should be good enough to make sure my engine plays decently in the start phase of the game as both black and white. Commercial engines have huge and highly optimized opening books - but they are made by opening specialists. I based my openings on the games played at the European Team Championship in 2009 and the European Individual Championship in 2010. There are about 2500 high level games played at those events combined. I then added a basic book containing all the ECO variations, so even if a variation like 1. f4 was not played in any single of the 2500 games my book still includes variations with the f4 move.
2. Which variations from the opening book will your engine play.
So once you have managed to compile a comprehensive book, then the next questions is what lines from that book should the engine play. Should it just choose random variations from the book? One very common method is to stastitical analysis on a large number of games played and record the end result of the game in the variations in your book. Then you assign a weight (a percentage) to each of the variations based on this analysis. Of course whether a game is won or lost does not neccesarily depend on the variation itself, but given a large enough amount of games, then the statistical analysis begins to be useful and more accurate than just choosing a random move from the book. Below you can see what i mean by the percentage point (screenshot taken from the fritz opening book - version 8) Notice that. 95.1 and 4.8 only adds to 99.9 %. The last 0.1 % are distributed among all the moves where it says 0 % (they are close to zero - so they are rounded down to 0 when displayed to the user). Usually no move in the opening book will ever have a 0 % chance of being played - it that was the case then you could just as well remove the move from the book since it will never be played. My engine uses this technique and it tends to play much better lines than just choosing a random move from the book.
3. How do you organise your opening book efficiently?
Most opening books are keept in a binary or a text file and then read by the engine when it starts playing. An engine needs a efficient way to store the opening books so it is easy and fast to look up which move to play next. One very efficient way is to organize the book as a special kind of tree, called a prefix tree or a trie (sorry, computer science jargon here). Below you can see what I mean. The root called "start" is the start position. Then for each move from that position in your book you make a branch (here there are d4, e4 and c4 branches). For each move after d4 you store the next moves - here Nf6 and d5. So when the engine is playing a game it will always know exactly where it is in the tree at any point. At each branch the engine will choose which path to take based on the percentage probabilities described above. If at any point the opponent makes a move for which there is no branch to go down - well, then the engine is out of the book and have to start thinking for itself. It is pretty easy to get an engine out the book - just play h4 and it will be out of the book fast. Although this will probably not be to your advantage!
I am kind of wondering how computers would play chess if they had no opening book at all. Maybe they would play in a completely different way than opening theory as it is constructed by humans. Of course you can simply turn the opening book off in your program, but then they still play according to the principles which humans thinks are best - after all chess engines are made by humans. It could turn out that for instance a4 or a3 is much better than e4 or d4. We simply do not know for certain at this point and we will not know for sure until (if ever) we get the 32-men tablebase (also known as God's algorithm in chess) and that is it not exactly around the corner! Chess will remain an enigma for a long long time.