Alpha-zero-stockfish (number of moves per second vs hardware debate)

Sort:
Avatar of i-am-greek

and who won the entire thing?

Avatar of i-am-greek

and who won the entire thing?

Avatar of pfren
Elroch έγραψε:
pfren wrote:

Fact no.1:

Stockfish was intentionally crippled.

This is nonsense. Stockfish has rarely been run on a faster machine. There is a valid debate whether Stockfish was hindered by the 1 Gb hash table, but my research on this finds that there is no solid evidence it would even have been stronger with a larger hash table (just unsubstantiated assertions, but it is a fact that increasing hash table size is not always beneficial), and less than none that it could have accounted for the different in strength,

AlphaZero was running on very powerful hardware tailored especially for running deep neural networks (and entirely unsuited to running Stockfish) as now available on the google computing cloud for the same purpose. There is no law limiting the hardware you can use when doing AI research.

How long do you think we will have to wait before anyone with any hardware and any engine can achieve 64% against the same version of Stockfish running on the same hardware and configuration used against AlphaZero?

Fact no.2:

Google does not like to provide details about the match, other than their AI application "crushing" Stockfish. Why only ten games, out of 100, have been made public?

Actually you have all 100 results of the games - 28 wins and 72 draws - not just an adjective. The results with black and white are also available. In addition there is a much larger set of results starting from a broad range of mainline opening positions (with Stockfish getting some wins, but doing only a few percent better than with both engines given free rein). 

I agree it would be nice to see all the games. Why not drop them an e-mail request and put your case? I would guess that some of the games consist of the sort of incomprehensibly complex positions that are not so interesting for humans as the rather beautiful wins we have seen published.

 

NO NONSENSE: Limited fixed time control.

NO NONSENSE: No opening book vs a huge amount of processed game data.

NO NONSENSE: Vey small hash memory.

NONSENSE: Everything in red text. See above (in your head is also fine).

Avatar of Chesserroo2

The extra processing power by AlphaZero could have been a big factor, but Google does not care. Their goal was just to make an AI that could figure stuff out on its own. They succeeded at that and are don't with chess. Their next problem will likely be physics related.

Avatar of DiogenesDue

It's just as likely that the other 90 games have not been published because showing how Stockfish played in those games would make it crystal clear how far from optimally the engine was playing under those settings.

Avatar of SilentKnighte5
btickler wrote:

It's just as likely that the other 90 games have not been published because showing how Stockfish played in those games would make it crystal clear how far from optimally the engine was playing under those settings.

No it's not "just as likely".


The 10 games were selected in the same way every GM has ever published a book of their own games.  They chose the 10 best/most interesting/most instructive games from the set.  No one publishes "My 60 Memorable Games and 500 Inconsequential Draws".

 

Avatar of DiogenesDue
SilentKnighte5 wrote:
btickler wrote:

It's just as likely that the other 90 games have not been published because showing how Stockfish played in those games would make it crystal clear how far from optimally the engine was playing under those settings.

No it's not "just as likely".


The 10 games were selected in the same way every GM has ever published a book of their own games.  They chose the 10 best/most interesting/most instructive games from the set.  No one publishes "My 60 Memorable Games and 500 Inconsequential Draws".

 

This is not some GM's memoirs, it's an experiment in machine learning.  The private testing without anybody but the DeepMind team present, along with the admission of the "questionable" settings, makes this whole episode into a kind of Schrodinger's Cat of actual engine performance.

Running the tests with, at best, ignorance of the best settings, or at worst, willful deceit for free publicity, makes this not just bad sportsmanship, but bad science.  If someone published something in the New England Journal of Medicine under these type of secretive conditions, they would be laughed at..."Just trust us, the other 90 patients also suffered no ill effects...and even though none of the patients' families had been informed and we never had their medical histories, we are sure we handled them optimally during treatment."

Avatar of Elroch
pfren wrote:
Elroch έγραψε:
pfren wrote:

Fact no.1:

Stockfish was intentionally crippled.

This is nonsense. Stockfish has rarely been run on a faster machine. There is a valid debate whether Stockfish was hindered by the 1 Gb hash table, but my research on this finds that there is no solid evidence it would even have been stronger with a larger hash table (just unsubstantiated assertions, but it is a fact that increasing hash table size is not always beneficial), and less than none that it could have accounted for the different in strength,

AlphaZero was running on very powerful hardware tailored especially for running deep neural networks (and entirely unsuited to running Stockfish) as now available on the google computing cloud for the same purpose. There is no law limiting the hardware you can use when doing AI research.

How long do you think we will have to wait before anyone with any hardware and any engine can achieve 64% against the same version of Stockfish running on the same hardware and configuration used against AlphaZero?

Fact no.2:

Google does not like to provide details about the match, other than their AI application "crushing" Stockfish. Why only ten games, out of 100, have been made public?

Actually you have all 100 results of the games - 28 wins and 72 draws - not just an adjective. The results with black and white are also available. In addition there is a much larger set of results starting from a broad range of mainline opening positions (with Stockfish getting some wins, but doing only a few percent better than with both engines given free rein). 

I agree it would be nice to see all the games. Why not drop them an e-mail request and put your case? I would guess that some of the games consist of the sort of incomprehensibly complex positions that are not so interesting for humans as the rather beautiful wins we have seen published.

 

NO NONSENSE: Limited fixed time control.

NO NONSENSE: No opening book vs a huge amount of processed game data.

NO NONSENSE: Vey small hash memory.

NONSENSE: Everything in red text. See above (in your head is also fine).

Firstly, it is likely that Stockfish could be configured to play a little stronger, but you don't seem to understand of how little importance this would be to the research that DeepMind did. This research showed that superb performance could be achieved at three different, difficult classical board games entirely by self-learning, with no input of game-specific human expertise.

There was no prize being fought over. However, it was necessary to have a very strong opponent, which Stockfish with 64 thread minutes per move surely is.

It would be a legitimate hypothesis that Stockfish could win a match against AlphaZero without being rewritten, but I would bet heavily against it. Would you take the other side of that bet, given the chance?

Avatar of funindsun

What was the reason for limiting stockfish to just 1min. per move?

Avatar of Elroch

Just a simple choice suitable for a large match, I would say. It's quite a lot for an engine running on 64 threads (equivalent to 8 minutes a move on 4 threads, say). I have not actually seen a single piece of analysis that suggests Stockfish would have changed the result by having more time: the benefits are quite limited at that level.

Avatar of Spider_hip

I was too excited at first when I first heard about this competition results. But when I searched about it a little more, actually I lost my excitement and felt a little bit dissapointed too. I mean artifical intelligence thing maybe the next era for humanity. But those hardware differences cast a shadow on the competition. They say those tpus' that deep mind used were so powerful than 64 standart cpus. Software is another thing, hardware is another thing. If you really want to show of with your software or ai, you should prepare same or similar hardware enviroment. I'm ok with 1mins limitation if both softwares obeyed same rules. You may count 1mins tournament as bullet and you may also make same competition with 5mins then. Then you may call that one as blitz. And reveal the results. Then I would respect the deep mind. Then there would be nothing for me to complain about. Most important thing is similar hardwares, then different time controls. 

Avatar of Elroch

There is no chess engine that has achieved that sort of performance on any hardware. So, while AlphaZero requires powerful TPUs to perform at its best, it is not that it was "just the hardware". It is best to think of AlphaZero as a different class of player from chess engines and humans, an intelligent entity that has learnt to play chess its own way from the rules, and it is that which is the basis of its exceptional strength.

Avatar of funindsun

quick search last night found 2 articles that appear sincere and knowledgeable, one is an open letter without an official respond (will post next) and a video that analyze the games with key "mistakes" made by the fish and claim might be avoided if given the environment it was designed for.

https://www.youtube.com/watch?v=ZGypfNUXM2U

@Elroch, interesting! now what about the opening book, hash and use of older sf version? I mean, why did they run the fish in less than optimize performance? did the alpha team ever explained that?

Avatar of funindsun

here is that letter....

 

"Dear Sirs,

Please let me congratulate you on your amazing achievement in developing AlphaZero chess! As someone who completed a thesis in neural networks back in the 90's, I could not be more amazed at how far you have been able to advance the field.

I can't speak for the entire Stockfish team so I simply speak as one of its open source contributors. If you read other posts on this forum or talkchess.com however you may find that what I'm about to point out may mirror the sentiment of others in the computer chess community as well.

AlphaZero won the 100 game match against Stockfish very impressively by a total score of 28 wins and 72 draws and 0 loses. This translates to an Elo difference of 100. However the details of the match described in your paper show that this match might have been much closer and more interesting had it not been for some IMO rather unfair conditions. These might not be immediately obvious even to those using chess engines on a regular basis.

1) In the match version 8 of Stockfish was used which is now over a year old. The latest version of Stockfish is over 40 Elo stronger in fast self play.
http://tests.stockfishchess.org/tests/view/5a23e7c10ebc590ccbb8b6d8
When consulted the Stockfish team always enters the latest version into serious competition such as TCEC.

2) The 1GB amount of memory used for the hash table on a 64 core machine with 1 minute per move is sorely inadequate. Stockfish displays the % of hash used so anyone can see how quickly it fills up. An reasonable amount of memory would likely have been around 16 times more at 16GB. The reason this is especially critical with many threads is because Stockfish uses the hash as the main mechanism through which all threads communicate(aka Lazy SMP). It is almost certain that this resulted in another significant Elo reduction in Stockfish.

3) Much effort has been put into making Stockfish understand which positions are critical and which are not. Based on this Stockfish manages its clock very carefully spending significantly more time on some positions during a game and very little on others. Disabling this feature and forcing Stockfish to use its time based on your same 1 minute for every move time control results in yet another large Elo reduction.

Since the Stockfish team wasn't contacted prior to the match I believe the issues outlined above were simply a result of unfamiliarity with the Stockfish engine. With the above issues corrected the 100 Elo gap should change quite significantly. I believe you are interested in a fair match more than winning and it is therefore my hope that a second proper rematch can be played for the benefit of both scientific research as well as the chess community. I wish to thank you for the tremendous contribution you have made to computer chess with a completely novel approach and hope that Stockfish has been a useful competitor for your testing. Please don't hesitate to contact myself or the Stockfish team in the future. We are your fans.

Sincerely,

(removed for privacy)"

Avatar of pfren
funindsun έγραψε:

 

@Elroch, interesting! now what about the opening book, hash and use of older sf version? I mean, why did they run the fish in less than optimize performance? did the alpha team ever explained that?

These 100 games (of which only ten were published) are not essentially different than the ones where Tsvetkov "beats" Stockfish.

Avatar of prusswan

 Alpha Zero is only a baseline of the previous work from Alpha Go, put together in short notice, so it has much better potential compared to Stockfish. If unoptimized Stockfish can be defeated so easily with 'starting point' Alpha Zero, it will have no hope against an optimized one, even after the small human tweaks to its evals and book (which should never be part of a true chess-playing engine). Alpha Zero has served its purpose as a proof of concept..and the future of what chess engines should be. The middlegame engines will soon become history.

Avatar of Elroch
funindsun wrote:

quick search last night found 2 articles that appear sincere and knowledgeable, one is an open letter without an official respond (will post next) and a video that analyze the games with key "mistakes" made by the fish and claim might be avoided if given the environment it was designed for.

https://www.youtube.com/watch?v=ZGypfNUXM2U

@Elroch, interesting! now what about the opening book, hash and use of older sf version? I mean, why did they run the fish in less than optimize performance? did the alpha team ever explained that?

Opening books are really a way for chess engines to avoid working: all openings can only result from the play of other agents in previous games, so it is the consensus of those agents that is "playing" in an opening book.

The interesting thing is how well an engine can find good moves, not how good a book it can read a move out of. DeepMind being interested in artificial intelligence rather than the very much simpler task of reading moves out of a book they focused on this. They did examine play from a wide range of mainline openings: this was more to test whether AlphaZero could play well in all openings rather than just those it preferred. The answer was "yes".

Those who say DeepMind should have used a more recent version of Stockfish are not taking into account the fact that research takes time: they simply used the version that was current rather than beta. It is certainly true that achieving the same against Stockfish 9 would be more challenging, but it looks very much as if it would not be necessary to upgrade AlphaZero, due to the large margin of victory.

All discussion of hash tables has been speculative. No-one can be bothered to find out whether a bigger hash table would have even been better, never mind how much better. The only empirical evidence I can find is that it would make an uncertain, quite small difference to Elo.

Avatar of funindsun

@IM Pfren, what is your take on IM Erik Kislik analysis of the games?

@pruss, I have no dog in this and couldn't care less about "who's better"
Consider the ingenuity of machine learning, just the idea that it can learn to play strong chess on its own is fascinating to me.

with that said, my question remain the same. WHY? why the goog team decided to cripple the fish (I'm aware the extent is arguable)

I have to add that am clueless to how the match came about... was it set as a real match to show superiority, or was it just another day at the office that somehow leaked out of control, or, or, or...

@Elroch, I always appreciate your input, but did you even read the letter I posted, or looked into the video analysis?

Avatar of prusswan

AlphaZero is also handicapped with the limited training time it had, with more computing resources and time it should just as easily beat an optimized Stockfish, which cannot scale as well. Just not worth the money though.

AlphaZero's advantage over Stockfish is both in methodology and scalability, although the two are related.

Avatar of Elroch
funindsun wrote:

 

@Elroch, I always appreciate your input, but did you even read the letter I posted, or looked into the video analysis?

Thanks. I read the letter and responded to some points above, I watched part of the video. It cannot show for sure that every game could be drawn - although if you believe chess is a draw, that is true of all games! - merely that Stockfish could have gone into other lines. In these lines, it would be unwise to rely on an opinion that AlphaZero could not find some other resource. That being said, if you examine any decisive game after the fact, you can identify somewhere that a player appears to have gone wrong and suggest an alternative that seems reasonable. This may not always be right: you might need to go back further, perhaps a lot further.

So I am not entirely sure what your point is.  Eg in the first game, it is possible that Qxc7 was a blunder: there is so much play that one can't really be sure. When I analyse different lines with Stockfish 9 here, eventually it likes Kg2. Does this mean that Kg2 is good enough to have drawn? I can't be sure. The big problem is that after Kg2 we don't have AlphaZero to find some brilliant continuation a few moves down the line. When analysing this game I found assessments change a lot over time: it is very difficult to get it right. Without giving it a massive amount of time per move, I don't trust any conclusions. And I am not sure about even then. happy.png