Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
Lyudmil_Tsvetkov

And that is why I don't like the approach, because it is too simplistic. The primary code is very simple, as it seems. What is complex is the tuning network, but that is just hardware.

Chess is much more complex than that, theoretically and that is why Alpha will not make big progress in the future.

Elroch
Lyudmil_Tsvetkov wrote:

 The team includes at least 3 chess programmers. Matthew Lai, the author of Giraffe and Talkchess member, is one of them. It is maybe for a reason that Giraffe, following the very same approach as Alpha, is rated only around 2400 on single core.

So, what you are saying is that AlphaZero is 3600 because they have someone on their team who has created an engine that reached 2400?

Likewise, the team has no chess player at above amateur level. 

However, the reason Matthew Lai is on the team is that he had tried to produce a chess AI, just one that was 1200 points weaker. 1200 points is not a difference that is achievable by speeding up hardware, even a lot. From articles on this, he was using a much smaller neural network, which even on slower hardware was able to look at about 10% as many nodes as a conventional engine.  (see this article)

However, I would agree that using modest computational resources would have been a huge barrier to the development of AlphaZero. The most demanding phase is the self-learning, and this would have taken months without the fast hardware, rather than 4 hours.

The reason AlphaZero benefits from more computation when playing is simply that its search tree gets bigger. But this search tree had 1000 times fewer nodes than that of Stockfish with the exact hardware each used!

It is the huge hardware that made the difference and not the approach.

Self-learning, self-learning, what do you mean self-learning and AI.

You admit you know nothing about the techniques that AlphaZero used to generate its strength: model-based reinforcement learning, termed "deep reinforcement learning" because the model used is a deep neural network.

I have studied this subject (including watching David Silver's excellent lecture series), and use Sutton's book on the subject.

It is true that AlphaZero uses a lot of processing power to achieve its highest strength in head to head play. However, with 30 times less power it would remain the highest rated engine according to AlphaZero's testing. While restricting AlphaZero's computational power would make the match closer, increasing the computational resource for both AlphaZero and a conventional engine like Stockfish would greatly advantage AlphaZero.

The key reason AlphaZero increases in strength more rapidly with computational resource appears to be that the branching factor of its search tree is smaller, to an extent which compensates enormously for it looking at far, far fewer positions. With the full power of 4 TPUs, AlphaZero was still looking at 1000 times fewer nodes than Stockfish! If its time was reduced by a factor of 30, it would be looking at 30,000 times fewer, and still be stronger!

As a result, when AlphaZero gets more time, its horizon must expand significantly faster than that of Stockfish. This is why not only is it stronger, it also indicates this technology has a permanent edge. 

 

You made a good point that I should emphasise: the role of those with knowledge of conventional chess engines in the design of the tree search algorithm of AlphaZero, which has some commonality with all chess engines. While I am no expert on chess engines, a key strength of AlphaZero is that it is better at allocating resources to different branches, and that this is achieved by the quality of the neural network's estimates of the probabilty that each move is best, which comes entirely from self-learning.

coldgoat

stockfish does not have to wear glasses to crush you

SmyslovFan

I do not believe that AlphaZero is able to perform better than 3600 strength. I believe that because that is, in my opinion, a close approximation of perfect chess. Stockfish was severely handicapped, so AlphaZero's performance rating can't be calculated. We don't know how strong Stockfish was during the match. My guess, and it's only a guess, is that it was around 3000 strength. It was still very strong, stronger than any human, but it was beatable. And it would have lost a match to a fully fit Stockfish fairly handily too.

Elroch

I am surprised you would make such a wild claim! Stockfish was running on the most powerful computer I have ever heard of it running on - 64 cores - far stronger than are used in most computer competitions. Given that the hash table was a reasonable size (exactly how near to optimal it was is not clear), the effect of the wrong hash table is the same as a fairly modest shift in the CPU speed and Stockfish's rating was changing very little with CPU speed at that level, it is unlikely that this caused many tens of Elo points harm.

As a result, we can be pretty sure the absolute rating of Stockfish's play was as high as its usual rating.

There is the issue of opening book, but in the period since human openings have been less and less useful to computers which are more than 600 points stronger, this too is an efficiency issue. Modern computer opening books are basically the product of computing time in previous computer games, or self-play.

I would point out that selecting an opening book is a different game to playing chess, and it is a someone dull one, since it is about generating a static book and following it by rote. AlphaZero devoted no time to this. However, when playing Stockfish in the full range of best openings on a second equal playing field (both being assisted or hindered equally), it was much stronger than its adversary.

admkoz
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:

What I am curious about is whether it "figures out" things like "don't give up a free queen", or does it really just have to figure that out again every time such an option presents itself?  

 

From there its experience improves these networks and after a while it would learn that positions where there was a queen missing tended to not have such as good an expected result. Well, actually it would get a general idea that more material is better[...]

I have put this crudely, but basically a big neural network learns to encapsulate concepts that can be very sophisticated[...]

So you're saying it DOES figure out that "more material is better" meaning that it can evaluate positions it has never seen before on that basis.  

 

You and me can glance at a board, see that there are no immediate threats, see that Black is up a rook, and figure Black has it in the bag, even if an actual mate is 30+ moves away.  We'll be right 999,999 times out of a million.  Can AlphaZero do that?  

We would not be right that often.

But yes, based on my understanding of the technology, it's positional evaluation network would be so good that without any explicit analysis at all it would play quite good chess. I am not sure how good it would be in this mode, but I do know it needs to do analysis to play at better than 2900 Elo (as it achieved near this level using about 1/30 of a second per move and got better as the time increased).

So what percentage of the time DO you think being up a rook in an otherwise normal position, in a game between > 1500 players, is a win?   That is just a quibble. 

 

OK, so AZ would do pretty well even if it was not allowed to do any further analysis.  That implies that AZ can evaluate any position, and it learned to do this solely by playing (initially) random games. 

 

I guess it may be that this is the kind of question that can't be answered in a blog post, but what I am trying to figure out is the form of that evaluation method and how it gets built.

The nature of the evaluation method is quite simple. It has some sort of representation of the position as an array of numbers which are the inputs to the neural network  - the neural network doesn't know what they mean, it has to work this out from they way the relate to the results of games and to their values on other moves - and a large deep neural network with thousands (not sure how many thousands) of nodes in many layers which take the representation of the board and output a number, the expected score from the position. [I hope I haven't missed some published detail].

How the evaluation method gets built comprises two parts (if my understanding is correct - I am supplementing what is published with general ideas about deep reinforcement learning). The obvious one is when a game ends: the exact value of the position is available, and that can be used to adjust the neural network to improve its evaluations of earlier positions in the direction of the right result. The second one is that when it evaluates a position, if this evaluation is a surprise compared to the evaluation of previous positions, the network is tweaked to make the evaluations of previous positions a bit more in agreement with the later evaluation.

The first form of feedback is basically making the evaluation compatible with the absolute value of clear positions. The second form of feedback is basically making the evaluation compatible with the legal continuations in a position: the reason is that the perfect evaluation of a position is the same as that of a later position reached by perfect play.

 

So, does that mean that there exists a function F(position) = Value which does not depend on 20-deep evaluations of the possible moves from the position?  If so, it would be kind of awesome if Google was to publish what that function was at the current state of AZ.  I am sure it would not be human readable - and just as sure that humans would be able to pick key things out of it.  

SmyslovFan
Elroch wrote:

I am surprised you would make such a wild claim! Stockfish was running on the most powerful computer I have ever heard of it running on - 64 cores - far stronger than are used in most computer competitions. Given that the hash table was a reasonable size (exactly how near to optimal it was is not clear), the effect of the wrong hash table is the same as a fairly modest shift in the CPU speed and Stockfish's rating was changing very little with CPU speed at that level, it is unlikely that this caused many tens of Elo points harm.

As a result, we can be pretty sure the absolute rating of Stockfish's play was as high as its usual rating.

There is the issue of opening book, but in the period since human openings have been less and less useful to computers which are more than 600 points stronger, this too is an efficiency issue. Modern computer opening books are basically the product of computing time in previous computer games, or self-play.

I would point out that selecting an opening book is a different game to playing chess, and it is a someone dull one, since it is about generating a static book and following it by rote. AlphaZero devoted no time to this. However, when playing Stockfish in the full range of best openings on a second equal playing field (both being assisted or hindered equally), it was much stronger than its adversary.

I defer to your understanding of computers, but not to your understanding of chess. An opening database is of tremendous assistance to computers. Humans could still beat engines that didn't use an opening database about a decade ago. The reports I read also suggest Stockfish didn't have access to an endgame tablebase either. 

You may know quite a bit about computers, but you are wrong to argue that opening databases and endgame tablebases don't materially help traditional engines such as Stockfish to play better chess.

Elroch

I don't disagree with you, and I am not bad at turn-based chess (although not as good as my ranking of #95 on chess.com suggests), where I understand the usefulness of a database.

As I pointed out, the advantage of an opening database is essentially a time saving: moves in an opening database are the result of previous computation (by people originally, but computers have become more important). You can use better players to play your moves in the openings, but if you are the best player in the world (like Stockfish or AlphaZero), the only advantage is that you save yourself some computation.

As such it is a bit of a cheat when comparing engines. The best opening book (so huge it is impractical) would allow a lousy engine to play well!

You need to separate the two functions of a computer: finding good moves in a general position, and reading good moves out of a database. The former is more what DeepMind were interested in.

Elroch
admkoz wrote:
 

So, does that mean that there exists a function F(position) = Value which does not depend on 20-deep evaluations of the possible moves from the position?  If so, it would be kind of awesome if Google was to publish what that function was at the current state of AZ.  I am sure it would not be human readable - and just as sure that humans would be able to pick key things out of it.  

There is, and it is a number that you get out of the AlphaZero neural network when you provide a position as inputs. My guess is that by just doing the evaluation for the position after each legal move it may be capable of master level play (the data implies only that it is below 2800 without doing analysis). Unfortunately, this network consists of literally millions of parameters used in calculations at considerable depth, so not too easy for humans to unravel. Our brains are even worse at up to 10^15 connections, though. wink.png

SmyslovFan

Again, you are making a critical mistake from a chess player's perspective. 

The opening database doesn't just give tactical short cuts, it provides positions that are playable. All of the main lines have been analyzed out far more than 40 ply in critical lines, and the resulting positions have been played countless times, by humans and machines. 

Opening databases don't just provide a "short cut", they provide a platform for reaching playable positions. The requirements of the opening are different from the requirements of a general middle game, and vastly different from those of an endgame. 

 

Gerberk8

I don t like Magnus Carlsen at all...He is a conceited twat  who does not know anything but chess...Compared to Kasparov he is an idiot all the way...

Lyudmil_Tsvetkov
Elroch wrote:
Lyudmil_Tsvetkov wrote:

 The team includes at least 3 chess programmers. Matthew Lai, the author of Giraffe and Talkchess member, is one of them. It is maybe for a reason that Giraffe, following the very same approach as Alpha, is rated only around 2400 on single core.

So, what you are saying is that AlphaZero is 3600 because they have someone on their team who has created an engine that reached 2400?

Likewise, the team has no chess player at above amateur level. 

However, the reason Matthew Lai is on the team is that he had tried to produce a chess AI, just one that was 1200 points weaker. 1200 points is not a difference that is achievable by speeding up hardware, even a lot. From articles on this, he was using a much smaller neural network, which even on slower hardware was able to look at about 10% as many nodes as a conventional engine.  (see this article)

However, I would agree that using modest computational resources would have been a huge barrier to the development of AlphaZero. The most demanding phase is the self-learning, and this would have taken months without the fast hardware, rather than 4 hours.

The reason AlphaZero benefits from more computation when playing is simply that its search tree gets bigger. But this search tree had 1000 times fewer nodes than that of Stockfish with the exact hardware each used!

It is the huge hardware that made the difference and not the approach.

Self-learning, self-learning, what do you mean self-learning and AI.

You admit you know nothing about the techniques that AlphaZero used to generate its strength: model-based reinforcement learning, termed "deep reinforcement learning" because the model used is a deep neural network.

I have studied this subject (including watching David Silver's excellent lecture series), and use Sutton's book on the subject.

It is true that AlphaZero uses a lot of processing power to achieve its highest strength in head to head play. However, with 30 times less power it would remain the highest rated engine according to AlphaZero's testing. While restricting AlphaZero's computational power would make the match closer, increasing the computational resource for both AlphaZero and a conventional engine like Stockfish would greatly advantage AlphaZero.

The key reason AlphaZero increases in strength more rapidly with computational resource appears to be that the branching factor of its search tree is smaller, to an extent which compensates enormously for it looking at far, far fewer positions. With the full power of 4 TPUs, AlphaZero was still looking at 1000 times fewer nodes than Stockfish! If its time was reduced by a factor of 30, it would be looking at 30,000 times fewer, and still be stronger!

As a result, when AlphaZero gets more time, its horizon must expand significantly faster than that of Stockfish. This is why not only is it stronger, it also indicates this technology has a permanent edge. 

 

You made a good point that I should emphasise: the role of those with knowledge of conventional chess engines in the design of the tree search algorithm of AlphaZero, which has some commonality with all chess engines. While I am no expert on chess engines, a key strength of AlphaZero is that it is better at allocating resources to different branches, and that this is achieved by the quality of the neural network's estimates of the probabilty that each move is best, which comes entirely from self-learning.

Those are all citations from the paper.

Actually, we don't know why its nps is smaller, what the precise reason is, and how they consider nps, those are just guesses, until they publish the code.

Neither we know what the depth of Alpha is and how this is related to nps and evaluation.

I would hate to discuss more, when we simply lack sufficient insider knowledge. But 2 things are crystal clear:

1) Alpha is some 300-400 elos weaker than SF on single core, and the very same estimate would probably reproduce if SF has access to a hardware similar to that of Alpha

2) almost a week after that paper and the results were published, that is, 168 hours later, there is still not a communique from Google they have solved chess, and even that Alpha has improved by meagre 10 elos.

Why would one think hardware is relevant to Alpha but irrelevant to SF?

This is obviously not true.

So that, basically, they built a 2800 elo engine, I acknowledge that readily. Only thing I can not understand is why they should have made a publicity out of that?

We saw a good match, that is all, a match we had never witnesses before. But, we could easily witness an even better one, if we match latest dev SF(+40 elo) on the 32 cores vs latest dev on 2000 cores. That would be a significantly better match, with higher quality of games, but only Google can find 2000 cores for SF.

So that, again, good hardware achievement. Google is not known to be lacking in funds.

Elroch
Lyudmil_Tsvetkov wrote: 

1) Alpha is some 300-400 elos weaker than SF on single core, and the very same estimate would probably reproduce if SF has access to a hardware similar to that of Alpha

This is neither accurate based on the data, nor especially interesting, any more than the 100m with one leg tied behind people's back is a particularly interesting athletics event.

2) almost a week after that paper and the results were published, that is, 168 hours later, there is still not a communique from Google they have solved chess,

That really is a very foolish comment.

and even that Alpha has improved by meagre 10 elos.

Why would it need to? It has shown itself to be the best. Fischer did not increase his rating at all after the world championship in 1972, but that did not stop him being the champion (until 1975). Not that AlphaZero-Stockfish was a world championship - rather it was a test of the world's strongest chess software against the world champion program.

Why would one think hardware is relevant to Alpha but irrelevant to SF?

One would not think that, but one would infer that Stockfish was improving a LOT slower with time per move (exactly equivalent to CPU speed) over the entire range of testing. Check the paper. This is why the AlphaZero approach will tend to dominate in the end, as massively parallel hardware becomes as ubiquitous as GPUs now. Google's TPUs are much like GPUs (which are also used in a big way in deep learning to speed up computation), but being designed for deep neural networks especially, they are more suited to purpose than GPUs which exist mainly to run video games fast!

So that, basically, they built a 2800 elo engine, I acknowledge that readily. Only thing I can not understand is why they should have made a publicity out of that?

A rating at chess is how well you can play. Not how well you can play if handicapped in the way some guy demands that you should be handicapped. AlphaZero would be shown not to be the best chess player, not when it is sufficiently handicapped, but when another entity, using whatever hardware it likes, is good enough to beat it. You can postulate that such an entity exists, but it has not been demonstrated.

We saw a good match, that is all, a match we had never witnesses before. But, we could easily witness an even better one, if we match latest dev SF(+40 elo) on the 32 cores vs latest dev on 2000 cores. That would be a significantly better match, with higher quality of games, but only Google can find 2000 cores for SF.

There was reported to be a spinoff of Stockfish running on a large cluster (thousands of nodes) several years back. But I am not sure of the status or accuracy of the company's claims. The website still exists, but seems to be inactive. http://www.chesscluster.com/magneto.html

I don't know how strong Stockfish would get with 30 times more cores (or whatever) but I am not sure it could cope with AlphaZero. But I would be interested to see!

So that, again, good hardware achievement. Google is not known to be lacking in funds.

I need to remind you that you have admitted you have no idea how the software works.

 Can I just check your personal feelings about some things relevant to your views?

  • google (the company)
  • conventional chess engines
  • Stockfish in particular
  • Some program you can never get a copy of
Elroch
SmyslovFan wrote:

Again, you are making a critical mistake from a chess player's perspective. 

The opening database doesn't just give tactical short cuts,

I certainly did NOT claim that it did. This is what is known as a strawman.

What an opening database does is provide choices of move that are probably good in an absolute sense.

it provides positions that are playable. All of the main lines have been analyzed out far more than 40 ply in critical lines, and the resulting positions have been played countless times, by humans and machines.

They have not been played "countless times". They have been played some finite number of times. Once you get a modest distance into an opening, you may find a few thousand games, a few hundred games and so on. Once it gets any lower, the statistical information becomes so noisy as to be highly questionable. The results of these games provide useful evidence as to the quality of a choice, but this is really just some precomputed spindly branches in the tree of possibilities. If it is a hundred games, it may contain quite a lot of computation - say enough for 10,000 move choices - but this computation is too narrow to be ideal. It is worth a lot as a random sample of the results of imperfect play, but has the potential to be revised with one insight (the number and quality of the games are key to how likely this is).

Opening databases don't just provide a "short cut", they provide a platform for reaching playable positions. The requirements of the opening are different from the requirements of a general middle game, and vastly different from those of an endgame. 

An entity that can see 10 moves further with equal reliability (as is likely to be the case with AlphaZero at full power) can do without 10 moves of opening theory.

There is an interesting point here. There are in a sense two games in chess, One game is where you play against another player in the normal way. The other "game" is where you find what lines are good in the openings, solo. Playing the latter game is a useful way of giving yourself a bit of an advantage in the first game.

In turn-based and correspondence chess, the two games are interlaced in real time, for OTB and live chess, the two are interlaced from one game to another. As you say the generation of an opening book is important to chess engines, as it is human professional chess players (and keen amateurs). However, most chess players would agree that the core skill of a strong chessplayer is finding good moves, and it is a far more interesting one. Opening knowledge is a way to manage with less of that skill.

Note that in 960 chess, this factor virtually vanishes for human players. There is only one game, with no opening theory. For computers, opening theory would be spread 960 times more thinly, which dramatically reduces its significance and usefulness (you can get an impression of how much by going to your chess database and imagining there are 1000 fewer games in it).

admkoz
Elroch wrote:

 

In turn-based and correspondence chess, the two games are interlaced in real time, for OTB and live chess, the two are interlaced from one game to another. As you say the generation of an opening book is important to chess engines, as it is human professional chess players (and keen amateurs). However, most chess players would agree that the core skill of a strong chessplayer is finding good moves, and it is a far more interesting one. Opening knowledge is a way to manage with less of that skill.

Note that in 960 chess, this factor virtually vanishes for human players. There is only one game, with no opening theory. For computers, opening theory would be spread 960 times more thinly, which dramatically reduces its significance and usefulness (you can get an impression of how much by going to your chess database and imagining there are 1000 fewer games in it).

So can AZ beat stockfish in 960? 

Elroch

That is a great question!

IMO, it would win as convincingly if it was retrained for 960 in exactly the same way, and the same patterns would occur (especially AlphaZero improving faster with computational resource). How well it would play if it had to play 960 having been trained entirely for conventional chess is interesting! There would be a fundamental problem with changing the rule for castling: this might be awkward to do while trying to retain the same trained neural network. But other than that, 960 positions are very like chess positions where some of the pieces are in odd places, so it should do well.

hangejj
chesster3145 wrote:

I think you’re misunderstanding. Computers can’t even be compared to humans as they don’t play in human rating pools, and they play a fundamentally different type of chess than we do. The idea that Magnus is a patzer I think is also a misunderstanding. He’s extremely strong, given that he has to sit at a board for 6 hours and is a very fallible human and computers don’t and aren’t.

This sums up my opinion.  At this point comparing humans and computer is chess seems pointless.

Sergeant-Peppers
Somehow, they probably will not be talking about Carlsen's opening play against Adams or his Qc6? howler against Nepo at the London werewolves in the same vein as Fischers Bh2 against Spassky.
Elroch

Humans can certainly be compared to humans at chess: earlier computers got realistic estimated ratings by being allowed to play in human competitions.  Less formal games against master provided good evidence of their ratings. Now, it is not a competition any more. Current computer chess ratings are not far off compatibility with the human scale, because they were started with that intention. 

Computers are no longer competitors to humans because they have too much advantage, so they are relegated to a separate category of superhuman player.

Godeka

@SmyslovFan:
The paper contains the results of 1200 games in the 12 most common human openings. Note these openings can limit the strength because they force positions which doesn't fit to the engine's playing style and can be weaker than the version preferred by the engine.

AlphaZero has the most losses in the B40 Sicilian Defence: as white 17 wins, 31 draws and 2 loss; as black 3 wins, 40 draws and 7 losses.

 

@Lyudmil_Tsvetkov
> Actually, we don't know why its nps is smaller [...]

Sure we know: Monte Carlo Tree Search without simulations + network calculations. Nothing special there.

 

> on the 32 cores vs latest dev on 2000 core

Isn't the supported maximum 128 threads?