Forums

Alpha-zero-stockfish (number of moves per second vs hardware debate)

Sort:
usmansk

Interestingly, everyone is debating on hardware issues. Obviously, alpha-zero is an advanced system and require a complex system to work. In my understanding, instead of looking at hardware, we should look at how many moves they are evaluating per minute.


Alpha-zero 80k moves per second

Stockfish 70 million moves per second

Stockfish is evaluating a higher number of moves than alpha-zero and has a great advantage over alpha-zero. Still, alpha-zero evaluates the position better and predicts a better move than stockfish. In four hours, alpha-zero has created better position evaluation criteria than human-knowledge-based position evaluation criteria in stockfish.

  What do you think about this?

Martin_Stahl

I agree that Stockfish should probably have access to opening books at least and maybe some better hardware. I also think, based on the limited information, that it is likely that the AZ evaluation process gained by training its neural network, are a match for Stockfish, at the very least.

 

Though, it is also possible the more it plays Stockfish, the better it will get against the engine and might show weaknesses in the the Stockfish evaluation and pruning functions. That is, it should be able to learn from the games against Stockfish and play a little better.

cy1wtoqzd
The "zero" from Alpha Zero's name stands for the amount of external knowledge. The AI only knows the rules of chess and past experience. The aim is to have an AI capable of improving it's ability to play chess. If you train the AI using human or Stockfish input it will become better at exactly that, beating humans, or Stockfish. This doesn't necessarily mean it will play better chess. We must resist the temptation to indoctrinate the AI with our narrow dogmatic views of chess inherent in both human chess and engine evaluation functions of chess engines.
vickalan
usmansk wrote:

...Still, alpha-zero evaluates the position better and predicts a better move than stockfish... 

What do you think about this?

 

If that's true, then it's time for a fair tournament between AZ and Stockfish:

1) Both engines should have equal hardware support. If AlphaZero can make equal evaluations at 80,000 mps when Stockfish is evaluating 70,000,000 mps (a factor of 875), then reconfiguring AlphaZero to play on standard processors should should not have a crippling effect.

2) Use a standard time control, such as 40 moves per 100 minutes. (The time control of "one minute thinking time per move" in a recent test was highly unorthodox, and not used in normal high-level games).

3) The tournament should be adjudicated by a neutral third part (such as chess.com), similar to chess-com-computer-championship.


In my opinion, without fulfilling these criteria, any claims about AlphaZero being a superior engine remain highly dubious.meh.png

Elroch

Sorry, but the only interesting change is to upgrade Stockfish's hardware as much as desired. The point of AlphaGo was to be the best chess player, not the best chess player with an arbitrary handicap.

Note also that it could well be that there is some type of equitable restriction that makes another engine better than AlphaZero. That does not make that engine the best player: that makes that engine a better player only when its opponent is handicapped to play more weakly: the games played would not be of such high quality, and thus not so interesting.

This does not matter so much when comparing conventional engines, because (extrapolating enthusiastically from a little data, backed up with some theoretical facts) they improve at a similar rate with computational resources (the reason this is to be expected is that they come from a similar heritage of alpha-beta search and pseudo-materialistic positional evaluation).

What AlphaZero probably does is play the best chess moves that have ever been seen. Amazingly, some of its games remain quite meaningful to us limited mortals.

vickalan

@Elroch, I agree. Giving more computer resources to Stockfish is another option. One possible tournament design is to allow each engine the option of "best available hardware". In this case, the developers of Stockfish should choose their hardware and the settings.
If the developers of AlphaZero choose the hardware and settings for Stockfish, then they are running a test, but it's certainly not a fair tournament.

After all, in the 2017 World Series, the Astros did not supply the equipment for their opponent. The Dodgers were allowed to use their own bats and mitts.happy.png

usmansk

Even if Google does not open up his Alpha-zero, we can still evaluate the strength of Alpha Zero

There is an interesting code for reinforcement learning chess available on github at

https://github.com/Zeta36/chess-alpha-zero/tree/master/src/chess_zero

 

I have tested it out, it is a good working code. Little older style like alphaGo where policy and value use to have separate neural network unlike google alpha zero where both are combined. On a PC, self learning is too slow. I  hope chess.com would have enough resources to test this code on GPUs and let it train for atleast 2 days and then let it play against stockfish and evaluate the strength.

I am also interested to see, if we let it play (train) against stockfish in self-learning mode, how long it will take to get the strength of Google's alpha-zero. Unfortunately, I don't have enough resources to get the answer.

-------------------------------------------------------
One more thing, Stockfish was running on 32 cores machine and was evaluating 875 positions or moves when alpha zero was only able to evaluate 1 position (70 millions/ 80K). If you say, stockfish was running on some low end machine and need the even more stronger machine, you are giving more power to stockfish and the difference ratio will increase even more. The difference will become as large as human vs machine. If you want to see the real strength of Alpha Zero, both should be allowed to evaluate an equal number of positions. ;P

 

usmansk
vickalan wrote:
usmansk wrote:

...Still, alpha-zero evaluates the position better and predicts a better move than stockfish... 

What do you think about this?

 

If that's true, then it's time for a fair tournament between AZ and Stockfish:

1) Both engines should have equal hardware support. If AlphaZero can make equal evaluations at 80,000 mps when Stockfish is evaluating 70,000,000 mps (a factor of 875), then reconfiguring AlphaZero to play on standard processors should should not have a crippling effect.

2) Use a standard time control, such as 40 moves per 100 minutes. (The time control of "one minute thinking time per move" in a recent test was highly unorthodox, and not used in normal high-level games).

3) The tournament should be adjudicated by a neutral third part (such as chess.com), similar to chess-com-computer-championship.


In my opinion, without fulfilling these criteria, any claims about AlphaZero being a superior engine remain highly dubious.

Will it be fair, if you match human and machine on same time control? The neurons in neural networks mimicks the neurons in human brain. Machine on neural network based evaluation technique will always remain slower than brute force machine like stockfish.  Only interesting question is "What will happen if alpha zero start evaluating a same number of moves as stockfish, how much his strength will increase?"

Elroch

vickalan was using very odd reasoning when he said that the fact that with the hardware used AlphaZero could only look at 1/875 as many positions so it would not matter if it was crippled by running on hardware very much less suited for its large neural networks!

In fact, we know the opposite: reducing AlphaZero's computing time has a larger effect on its performance that StockFish. This is one of the impressive things about AlphaZero: a much more rapid improvement in strength as it is given more computing time.

It's true that AlphaZero simply is not another engine that works well on a PC. It is in a different class: an AI that shines when provided with the deep learning-specific hardware for which it was designed.

This is not unique to chess or games: many of the most impressive achievements of deep learning for AI in recent years have required very powerful hardware to run in a reasonable time. This is actually the reason deep learning has only become a big thing in  the last decade: prior to that, the hardware was a major obstacle to progress in most cases.

vickalan

The 1/875 ratio was in relation to this comment:

"Alpha-zero 80k moves per second
Stockfish 70 million moves per second
Stockfish is evaluating a higher number of moves than alpha-zero and has a great advantage over alpha-zero."

I'm not sure if that info is correct or not, but if the AZ team feels it is the best chess-machine, then it's time for a fair match. The discussion about theoretical performance is interesting, but I'm most interested in seeing which engine will win in a fair tournament.

It's like deciding which baseball team is the world champion: it's not by comparing batting averages - it's decided by playing baseball. I think it would be cool to see AlphaZero play chess in a tournament adjudicated by a neutral tournament director (such as this).

universityofpawns
usmansk wrote:

Interestingly, everyone is debating on hardware issues. Obviously, alpha-zero is an advanced system and require a complex system to work. In my understanding, instead of looking at hardware, we should look at how many moves they are evaluating per minute.


Alpha-zero 80k moves per second

Stockfish 70 million moves per second

Stockfish is evaluating a higher number of moves than alpha-zero and has a great advantage over alpha-zero. Still, alpha-zero evaluates the position better and predicts a better move than stockfish. In four hours, alpha-zero has created better position evaluation criteria than human-knowledge-based position evaluation criteria in stockfish.

  What do you think about this?

Given this, would not the logical conclusion be that the way humans think and evaluate has inherent flaws???

usmansk
universityofpawns wrote:
usmansk wrote:

Interestingly, everyone is debating on hardware issues. Obviously, alpha-zero is an advanced system and require a complex system to work. In my understanding, instead of looking at hardware, we should look at how many moves they are evaluating per minute.


Alpha-zero 80k moves per second

Stockfish 70 million moves per second

Stockfish is evaluating a higher number of moves than alpha-zero and has a great advantage over alpha-zero. Still, alpha-zero evaluates the position better and predicts a better move than stockfish. In four hours, alpha-zero has created better position evaluation criteria than human-knowledge-based position evaluation criteria in stockfish.

  What do you think about this?

Given this, would not the logical conclusion be that the way humans think and evaluate has inherent flaws???

Flaws, not as such. The only problem with humans is that we are unable to compute like machines. Second, we taught chess engines to play any kind of positions so we had made many different goals that sometimes get conflicts with each other. Alphazero learns to play the whole games. We don't know that engine can get any position and can play on that like stockfish and other engines as well or not. 

I am not a good player to decide on the human evaluation of chess position is altogether wrong or not.  In the videos shared by chess.com, it was clear that alpha zero was exploiting the greediness of stockfish by giving it two pawns and even sometimes a piece to keep the stockfish piece stuck or get a long positional advantage. This was Tal's like  and Karpov's like human mentality which was earlier refuted by stockfish through brute-force calculations. Humans tend to lose hope when they go down a piece or two pawns against chess engine but now it is appearing that with good calculations that mentality overcomes the greediness of stockfish. It appears pieces activity is more important than equality of the position. However, a good player can tell better than me.

usmansk
Elroch wrote:

vickalan was using very odd reasoning when he said that the fact that with the hardware used AlphaZero could only look at 1/875 as many positions so it would not matter if it was crippled by running on hardware very much less suited for its large neural networks!

In fact, we know the opposite: reducing AlphaZero's computing time has a larger effect on its performance that StockFish. This is one of the impressive things about AlphaZero: a much more rapid improvement in strength as it is given more computing time.

It's true that AlphaZero simply is not another engine that works well on a PC. It is in a different class: an AI that shines when provided with the deep learning-specific hardware for which it was designed.

This is not unique to chess or games: many of the most impressive achievements of deep learning for AI in recent years have required very powerful hardware to run in a reasonable time. This is actually the reason deep learning has only become a big thing in  the last decade: prior to that, the hardware was a major obstacle to progress in most cases.

 

True.

 

CristianGarcia1

Didn't read all the comments BUT people are missing the point since probably they are not familiar with Deep Learning: Alpha zero runs the neural network inference on a TPU (a special GPU owned by Google)  and monte carlo search on the CPU, Stockfish only uses CPU. So it is unclear what people mean when they say "give them the same hardware", there are 2 possibilities: 1) force alpha to run on a CPU only, but that would make it really slow, convolutional neural network are mean't to be run on GPUs/TPUs, 2) pointlessly give stockfish a TPU, it just wont use it.

If you have question please ask, I do A.I. for a living and love chess.

Elroch
vickalan wrote:

The 1/875 ratio was in relation to this comment:

"Alpha-zero 80k moves per second
Stockfish 70 million moves per second
Stockfish is evaluating a higher number of moves than alpha-zero and has a great advantage over alpha-zero."

I'm not sure if that info is correct or not, but if the AZ team feels it is the best chess-machine, then it's time for a fair match. The discussion about theoretical performance is interesting, but I'm most interested in seeing which engine will win in a fair tournament.

It's like deciding which baseball team is the world champion: it's not by comparing batting averages - it's decided by playing baseball. I think it would be cool to see AlphaZero play chess in a tournament adjudicated by a neutral tournament director (such as this).

It is fair to think of AlphaZero as in the "superheavyweight" class of chess players. It simply cannot compete in the middleweight class defined by TCEC (because it can't run on that hardware), but could feasibly compete in WCCC (where hardware is unlimited in the main competition) if DeepMind had any interest in it. But there is no sign of that: their motivation was not to play competition chess.

While it would be interesting to see AlphaZero play against an engine like Komodo running on a cluster (which allowed it to win the WCCC in 2017) rather than Stockfish on a massively multicore machine like the one used in the DeepMind research, given the data on Komodo, Stockfish and Houdini, this would not be expected to produce a different result.

madhacker

It wasn't a fair contest and didn't prove a great deal about the pure strength of AZ, but you've got to admit that just purely from a chess point of view, some of the games were really beautiful. Especially the one with the early piece sac with the initiative lasting the whole game.

Martin_Stahl
pfren wrote:

Fact no.1:

Stockfish was intentionally crippled.

 

Fact no.2:

Google does not like to provide details about the match, other than their AI application "crushing" Stockfish. Why only ten games, out of 100, have been made public?

 

While they haven't yet provided the game, in addition to the first 100, there were 1,000 more games from set openings where Stockfish got some wins. AlphaZero still did very well but the lack of an opening book was less a factor in those cases.

Elroch
pfren wrote:

Fact no.1:

Stockfish was intentionally crippled.

This is nonsense. Stockfish has rarely been run on a faster machine. There is a valid debate whether Stockfish was hindered by the 1 Gb hash table, but my research on this finds that there is no solid evidence it would even have been stronger with a larger hash table (just unsubstantiated assertions, but it is a fact that increasing hash table size is not always beneficial), and less than none that it could have accounted for the different in strength,

AlphaZero was running on very powerful hardware tailored especially for running deep neural networks (and entirely unsuited to running Stockfish) as now available on the google computing cloud for the same purpose. There is no law limiting the hardware you can use when doing AI research.

How long do you think we will have to wait before anyone with any hardware and any engine can achieve 64% against the same version of Stockfish running on the same hardware and configuration used against AlphaZero?

Fact no.2:

Google does not like to provide details about the match, other than their AI application "crushing" Stockfish. Why only ten games, out of 100, have been made public?

Actually you have all 100 results of the games - 28 wins and 72 draws - not just an adjective. The results with black and white are also available. In addition there is a much larger set of results starting from a broad range of mainline opening positions (with Stockfish getting some wins, but doing only a few percent better than with both engines given free rein). 

I agree it would be nice to see all the games. Why not drop them an e-mail request and put your case? I would guess that some of the games consist of the sort of incomprehensibly complex positions that are not so interesting for humans as the rather beautiful wins we have seen published.

 

i-am-greek

who won the mvl Carslen match in speed chess pro?

 

i-am-greek

and who won the entire thing?