Chess From Code: How Do Engines Evaluate Positions?
Not all images make sense. (Modified; Original by the European Southern Observatory; CC BY 4.0).

Chess From Code: How Do Engines Evaluate Positions?

the_real_greco
the_real_greco
|
12

When sitting at a chessboard, what are players thinking about? Much of their time is spent answering a simple question: "Is this position good or bad?" They might have incredible vision, they might never miss a forcing line... but if at the end of the line they can't answer this question, they will be hopeless as players. They need some way of evaluating positions. They need an evaluation function.

Let's say you give your engine a position, and it gives you a number like +1.50. This number is the output of the engine's evaluation function (with help from the search algorithm), and it means the engine believes white is much better (but has not yet reached a winning position). The units of this number are traditionally centipawns or pawns. But where does that number come from? I'm glad you asked...


Ethereal, by Andrew Grant.

Normally I'd use Stockfish as my example, but this is a good time to highlight Ethereal, an open-source engine by Andrew Grant. Like Stockfish, Ethereal is an AB engine with handwritten evaluation. Unlike Stockfish, the source code is neat and well-annotated (At least that's what I've been told by people that would know.)

Although Ethereal is written in C, I'm not expecting you to know anything about programming. I don't actually know C myself! But this post will make much more sense if you're willing to look at the linked source code files.

Take a look at Ethereal's evaluation header file, evaluate.h (don't worry about why it's a 'header' file). There's a lot in there, but starting on line 38 it should give you an idea of what things Ethereal looks at in its evaluation: 

  • It looks at material on the board (PawnValue, KnightValue, etc.) (lines 39-44)
  • It looks at where those pieces are and how good a square that is (PawnPSQT, KnightPSQT, etc.)   (lines 45-50)
  • Which pawns might become passers (PawnCandidatePasser) (line 51)
  • Knights on outposts (KnightOutpost) (line 56)
  • Many, many more

Did you also notice lines 32-35 where adjustments are set for opposite-colored bishop positions? I could talk about evaluation.h all day, but we should  move to a new file, where the evaluation function truly lives...


After evaluate.h has defined a number of variables (PawnValue, KnightOutpost, etc.), those variables get used in evaluate.c. This file is 1000 lines long, so I'll just look at parts of it.

35 /* Material Value Evaluation Terms */
36
37 const int PawnValue = S( 105, 118);
38 const int KnightValue = S( 450, 405);
39 const int BishopValue = S( 473, 423);
40 const int RookValue = S( 669, 695);
41 const int QueenValue = S(1295,1380);
42 const int KingValue = S( 0, 0);
   

Above are the base values of Ethereal's pieces (line 35). They're all integers (int), but there's no reason to keep them small like the human 1, 3, 3, 5, 9 system. Furthermore, Ethereal has two values for each piece- one for the beginning of the game and one for the end of the game. Knights begin as being worth 450 centipawns; at the end they are worth 405 centipawns. These numbers are likely determined empirically, by testing how different values affect performance.

As I said above, Ethereal gives each piece a value adjustment based on location (line 44). Because of board symmetry, it only needs to store values for 32 squares (rather than all 64); it also has different (beginning, endgame) adjustments. These next lines are for pawns. They can't be on the 1st or 8th ranks, so those squares are 0 for simplicity.  The other values seem strange- why is a rook pawn on the 6th rank worse than both one on the 5th and one on the 7th rank? But these values are also likely found empirically, so I suppose it's true:

44 /* Piece Square Evaluation Terms */
46 const int PawnPSQT32[32] = {
47 S( 0, 0), S( 0, 0), S( 0, 0), S( 0, 0),
48 S( -19, 9), S( 6, 4), S( -11, 7), S( -6, -1),
49 S( -21, 4), S( -11, 3), S( -8, -5), S( -2, -13),
50 S( -16, 12), S( -10, 11), S( 14, -13), S( 12, -24),
51 S( -4, 16), S( 4, 11), S( 0, -2), S( 14, -21),
52 S( -4, 32), S( 1, 30), S( 10, 19), S( 38, -8),
53 S( -17, -40), S( -65, -9), S( 3, -23), S( 40, -37),
54 S( 0, 0), S( 0, 0), S( 0, 0), S( 0, 0),
55 };

There are a number of further calculations one needs for pawns; the "evaluatePawns" function is 81 lines long (363-434). But their //comments should all be comprehensible: 

413 // Apply a penalty if the pawn is isolated, and there is not an
414 // immediate pawn capture to potentially remedy the isolation
... ...
420 // Apply a penalty if the pawn is stacked
... ...
426 // Apply a penalty if the pawn is backward
   

Similar analyses are done for all other pieces.


In addition to piece valuation, Ethereal also evaluates threats and weaknesses in its position (starting on line 819). Some examples:

857 // Penalty for each of our poorly supported pawns
... ...
872 // Penalty for all major threats against poorly supported minors
... ...
892 // Bonus for giving threats by safe pawn pushes
   

You might have noticed that all the terms I've looked at are fairly simple- no huge amount of chess knowledge is required to understand them. That's one hallmark of AB engines; the evaluation doesn't capture the intricacies of any position very well. 

You might ask- why can't someone just write a better evaluation function? The answer is that people are trying, but it's difficult. Andrew Grant told me that he tried to add a penalty for tripled pawns (because they're obviously terrible!), but for whatever reason the engine became weaker with that penalty. Code for positional subtleties is extremely unlikely to add strength.


The last point I want to make about evaluation functions is that one needs to be careful about what their outputs signify.

By Zermelo's Theorem (Ernst Zermelo, 1913), every chess position is a win for white, win for black, or a draw. Therefore, a perfect evaluation function would report only forced mates (+M or -M) or a draw (0.00). The evaluation terms I've highlighted don't help you accomplish this.

The engine evaluation therefore has nothing to do with the 'true' value of the position. It's simply a way for the engine to indicate the favorability (in human terms) of a position. The only 'true' evaluation function is a tablebase, which reports only mate or draw scores.

That's it for today. Hopefully this all made some sense, and made engine evaluations less mysterious. Please comment and let me know what you think!