AlphaZero vs the Drawn Evaluation

AlphaZero vs the Drawn Evaluation

Avatar of novacek
| 4

It has been clear for a while that AlphaZero is a chess program unlike any other. Armed only with the rules of the game, it played "millions of games against itself via a process of trial and error called reinforcement learning. At first, it plays completely randomly, but over time the system learns from wins, losses, and draws to adjust the parameters of the neural network, making it more likely to choose advantageous moves in the future." (Deepmind.com)

In 2017 AlphaZero played a 100-game match against the highest rated engine in the world, Stockfish. AlphaZero won 28 times and lost 0 times, with the rest of the games finishing as draws. This shook the chess world, with many hailing it as a defining moment for the future of chess. 

In 2018, as if to prove that the first crushing victory was no fluke, another match was played between AlphaZero and Stockfish. This time the 1,000 game-long contest ended with a crushing victory  of +155 -6 =839 in AlphaZero's favour.

What makes its success all the more interesting is that it is not restricted by chess principles - or dogmas - so its games are original and often very attacking. In fact, its lack of knowledge about the worth of the pieces that have been attributed to them by us has led to many games where AlphaZero gives up several pawns to open up lines. The following game is an extreme example of this:

Impressive, right? It quickly became clear that White was having a lot of fun in that game. However, at move 38 Stockfish evaluates the position at +0.2 or in other words - completely drawn. Five moves later it was totally lost.

This point is picked up on in Game Changer by Matthew Sadler and Natasha Regan :


In fact, the curious issue of an incorrect drawn evaluation is given a close analysis in the book. I suggest you take a look yourself and get a copy of the book- it's a fantastic modern book.

Anyway, here goes my thoughts on the topic. I'll present it in a similar way to the presentation of the '0.00' section of Chapter 2 in Game Changer, albeit with different positions:

Game 1- 

So far the game has not particularly been in the spirit of a traditional Evan's Gambit, has it? Right now Stockfish at a depth of 23/23 assesses this position with Black to move as dead drawn - 0. Here are its top 3 suggested lines:
Stockfish 10+ WASMX Top 3 lines at Depth 23/23
1 41...Qxf5 42.Rce2 Kh8 43.Re4 Qf6 44.Kg1 Nb7 0.0
2 41...Bb8 42.Bd3  Rc8 43.Kh1 c4 44.Be2 Kh8 +0.6
3 41...Nb7 42.Ba6 Na5 43.Bd3 Bb8 44.Nxe5 dxe5 +0.6

Now let's try to assess this from a more human point of view. Here were some of my initial thoughts when I saw the position:

1. Black's Bishop is arguably the worst piece on the board, with all but one of its 'friendly' pawns on the same coloured squares.

2. White has the Bishop pair. This isn't such a big deal right now since the position is closed but may be useful if diagonals become available.

3. White's pawn structure is fractured. This has its strengths and weaknesses.

4. Black's Knights are comfortable. The a5 one looks less useful but does control some sqaures on the Queenside, the one on e5 is nicely centralised.

5. White has a strong hold over the e6 square. If he can plant a Knight there it would be very powerful.

6.There aren't any obvious pawn breaks for either side.

So both sides have their positives and negatives. Some maneuvering will be required to prove anything but I'd assess it as slightly better for White. A plan of Ng5-e6 looks natural and does eventually take place in the game.

So who is right? Is it really a dead draw or does one side have an edge? Let's see how the game concluded.

Aha! So Stockfish was wrong after all. Let's give it another chance with one more game.

Game 2-

So what does Stockfish think about AlphaZero's chances here? I let it run for a few minutes to get a feeling for what it thought:

Stockfish 10+ WASMX Top 3 lines at Depth 22/22

1 24...Ng5 25.Nab6 Nh3+ 26.Kf1 Ra7 27.Nxd5 Bxd5 +0.4
2 24...Nb4 25.Qb1 dxc5 26.dxc5 Qg6 27.a3 Nd5 +0.9
3 24...Qf7 25.Bf1 Nb4 26.Qb1 dxc5 27.dxc5 Bxc4 +1.0

So you'd be led to believe that Stockfish has a slender advantage but a comfortable position on the whole, with chances to go wrong for Black.

Let's quickly break it down so we can understand what's actually going on for ourselves:

1. Black's hopes for an attack on the Kingside seem to have been cramped by White's last move. 

2. If Black doesn't get through he'll be left to stew with no plan until White takes action.

3. White's pawn chain is impressive. It keeps Black at bay to some extent and gives White some space.

4. White's dark-squared Bishop looks bad, although it may have some defensive duties

So White will probably be rather safe if he can weather any storm which, with 24.h4, he seems to have done successfully. Black's only hope is to try and break up the Kingside pawn formation. How can he do that, though? The answer soon becomes clear.

In the time it took me to write this analysis Stockfish had changed its mind. Suddenly, Black is doing well- very well, in fact. Stockfish says the situation is now -1.0 in Black's favour! And it would be right...

This is the second time that AlphaZero has proven Stockfish wrong about its evaluation and there are plenty more examples. Again, I suggest you have a look at Game Changer if you are interested. But why would this be? After all, Stockfish isn't weak- far, far from it. 

The book suggests that the hand-crafted Stockfish is too bogged down by general rules to be flexible enough to change its mind on certain things. If its evaluation is largely based on a certain feature of a position, it will be "dulled to other concessions it is making".

Therefore, AlphaZero's power lies in the fact that it has a totally unique view on how chess works. The whole concept of AlphaZero is that it had zero previous knowledge of chess besides the rules when it started learning. This means it is free from dogmas, in a sense.

In other words, AlphaZero's greatest strength is that, to us at least, it's as if it doesn't know anything at all.

(Thanks a lot for reading. I'm aware that the heavy engine-evaluation content is not to everyone's taste but I hope that the sprinkling of human reasoning, no matter how erroneous my attempts at analysis may be, provided relief from the numbers.)