Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
Lyudmil_Tsvetkov

Congratulations all!

Now, almost 4 weeks after this seminal event, 4*150 training hours later, Alpha is certainly somewhere around 15 000 - 20 000 elos.

 

Congratulations!

SmyslovFan

That is one of the great myths that amateurs cling to: if only I had more time to think, I could beat X. X is usually the current world champion, but here, it's a computer.

 

As the creators of AlphaZero pointed out, it's not how many variations one looks at, but how well the player understands what they're seeing.

 

The anecdote that Micky Adams won a game without seeing any of the combinations his opponent was trying to play really does strike at the heart of what it is to be a GM.

 

Magnus once said that he usually knows instantly which move to play, but spends all his time verifying it.  

 

Give Magnus 3 days per move and he won't play significantly better than he already does. Give an amateur who isn't quite as adept at calculating 3 days per move and they will improve dramatically but still miss basic concepts that a GM sees instantly.

 

Magnus playing at correspondence time controls would be a monster, but he would still not bridge the gap between humans and engines.

Godeka

@mcris:
> But why not further increase of strenght for the next 40k steps?

You cannot increase the strength infinitely. It seems that AlphaZero with the current NN setup is at or near the maximum, so more training time does not have any effect.

mcris

Current NN setup? Like the number of neurons? Or of TPUs? If so, SF also would be stronger with more CPUs and RAM for the hash table. It is all reduced to hardware.

Godeka

Yes, for example the number of neurons, or any other hyper parameter (learning rate, number of hidden layers, functions etc.). To have more TPUs is the same as to have more training time.

 

Of course AlphaZero will play stronger with more TPUs => deeper search, more variants. In fact it scales more effectively, but this is because of MCTS and has nothing to do with the NN. It is the same for alpha-beta pruning: more time, deeper search; but it does not depend on the evaluation function which is fix and does not change.

 

 

Elroch
mcris wrote:

Current NN setup? Like the number of neurons? Or of TPUs? If so, SF also would be stronger with more CPUs and RAM for the hash table. It is all reduced to hardware.

I came to a similar conclusion as Godeka: he is referring to the architecture of the neural net: the number of layers, the numbers of nodes in the layers and the choice of various design parameters such as activation function and regularisation. There is little doubt that an even bigger network, combined with a larger training set could be more powerful than AlphaZero, and benefit even more from additional computational power. A bigger network requires more computation for each evaluation, so it could be weaker when given inadequate computational resources (it would be unable to sample enough lines), but once trained with adequate data, it would surpass a smaller network at some level of computational resource and continue to widen the gap above that.

I believe both AlphaZero and a conventional engine would approach perfect play exponentially slowly with a vast increase in computing time: remember that the network in AlphaZero is roughly a replacement for the evaluation function, and more brute force makes up for an imperfect evaluation function, but brute force is very expensive way to approach perfect chess. What can be achieved at exponential cost is not so interesting in computer science.

AlphaZero improves more rapidly with computational resources than Stockfish because its network allows better selective search. This means the cost of each extra ply of effective depth is a smaller ratio in increase of computational resources. Those who claim this is due to something wrong with the experimental setup need to remember that Stockfish was analysing over 1000 times as many nodes as AlphaZero. The cost per node is very high with AlphaZero, but this is a constant factor, and computer scientists love constant factors because when you scale the computation they don't increase!

santiagomagno15

Well Magnus can have a perfomance of 2900 or 3000 and stockfish has 3400, I dont know how much has alpha zero but compared to those titans, yes, but its like comparing Usain Bolt, the fastest human with a Ferrari, not fair 

Lyudmil_Tsvetkov

If 1 game is played per day, and then Magnus has a rest day, he would at least draw.

Humans simply get tired at some point.

Elroch

Why don't you believe that chess players can be quite accurately compared using the Elo system? Is it the same reason you claimed that you could produce an engine with a 4600 rating using your ideas?

ponz111
Lyudmil_Tsvetkov wrote:

If 1 game is played per day, and then Magnus has a rest day, he would at least draw.

Humans simply get tired at some point.

Magnus would probably lose. He might obtain a draw every once in a while but he would mostly lose.

Magnus is a human and cannot compute near as fast as Stockfish or Alpha Zero. There are other reasons why Magnus would lose also... 

Lyudmil_Tsvetkov
Elroch wrote:

Why don't you believe that chess players can be quite accurately compared using the Elo system? Is it the same reason you claimed that you could produce an engine with a 4600 rating using your ideas?

What do you mean chess player?

There is a psychological side to it, physical side to it, etc.

Is a computer a player? What computer? 1, 16, 128 or 1024 cores? There is a distinction, is not it?

So that Carlsen has fully the right to take a break.

Lyudmil_Tsvetkov
ponz111 wrote:
Lyudmil_Tsvetkov wrote:

If 1 game is played per day, and then Magnus has a rest day, he would at least draw.

Humans simply get tired at some point.

Magnus would probably lose. He might obtain a draw every once in a while but he would mostly lose.

Magnus is a human and cannot compute near as fast as Stockfish or Alpha Zero. There are other reasons why Magnus would lose also... 

But his positional understanding is much better. That counts alot.

I guess most of the human losses to computers are due to the fact the humans play unprepared.

Look at Nakamura, with rare exceptions(some KIDs he played), he rushes to open the game against the engine. Precisely the way one should not handle a computer.

I guess Carlsen has very good chance against the top engines, if he is prepared and takes sufficient break after each game played.

Current top engines play 2500 positionally, so a lot of leeway to beat them.

Elroch
Lyudmil_Tsvetkov wrote:
Elroch wrote:

Why don't you believe that chess players can be quite accurately compared using the Elo system? Is it the same reason you claimed that you could produce an engine with a 4600 rating using your ideas?

What do you mean chess player?

An agent that plays chess. Humans and computers are the two main classes of examples.

There is a psychological side to it, physical side to it, etc.

There is no such side to chess performance. This is determined by the results against the opposition, which depends on the moves made and the quality of the process used to select those moves. Rather, such factors AFFECT the quality of decisions and the moves made, and these affect the rating.

Is a computer a player? What computer? 1, 16, 128 or 1024 cores? There is a distinction, is not it?

Of course: these are two different players.

So that Carlsen has fully the right to take a break.

He might reasonably insist on conditions that allow him to play at his best. There is no law about this.

The rating of a chess player is a measure of their results (and by implication their quality of play) against other players. The rating is a measure of actual performance, has no psychological aspect, and covers all agents that play chess. [Note that even among humans there are different categories of players - eg Petrosian's type of player plays differently to Alekhine's type - but this is no obstacle to a single rating].

Carlsen might well be able to play a tad stronger in Elo terms if he was rested for every game, but very few would suggest it could be more than, say, 50 points (generously).

There is no such thing as positional Elo, because there is no game called positional chess. There is merely Elo taking into account your ability to steer the game into a position you like (and your opponent's ability to do the opposite). Humans do well to steer a game to positions which may be difficult for computers in games against them, but it is no longer enough.

Elroch
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?

No. It is a simple function with many inputs, as many learnable parameters (+1, actually) and one output.

What are the large number of real valued inputs and how are those inputs derived from the actual board position?

See the paper: it is a very simple representation of the position, nothing more.

What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?

Studying neural networks, reinforcement learning will answer this, but roughly speaking for each sample of positions it evaluates its inaccuracy and corrects the parameters of the neural network in the direction that would reduce that inaccuracy for that particular sample, in the belief that this will generalise to other positions when repeated many times.

How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

Well, it would help to first of all study the things that are in the public domain. Not a huge amount it secret (notably the detailed architecture of the neural network, probably due to IPR issues.

2.9 billion positions, number you threw out above, barely scratches the surface.

Stockfish examined 4,200,000,000 nodes (positions, that may be revisited) per move. AlphaZero examined only 4,800,000!

6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not read an article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

Axis-and_Allies-Fan
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?   What are the large number of real valued inputs and how are those inputs derived from the actual board position?  What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?  How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

 

2.9 billion positions, number you threw out above, barely scratches the surface.  6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not even read any article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

Very Interesting. 

Godeka

@Elroch:

> The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

 

Are you sure? To have heavy playouts means that there must be patterns or rules which determine the probability. Where should the patterns or rules come from? Or do you mean the selection of ways down to leafs? They are selected according to the MCTS algorithm of course (node expansion).

 

Or did I get something wrong?

Elroch

Contrary to an earlier temporary misunderstanding of mine (based on missing information and my own guess about best design), the paper clearly indicates that the single network outputs probabilities for every legal move (which was actually what I guessed earlier still!). This seems rather cumbersome because it requires a lot of outputs (many of which are filtered because of legality), but it clearly works rather well. A big advantage is that one evaluation produces information on all possible moves, rather than doing one evaluation for each position that can be legally reached in one move.

Godeka

As I take it the NN is not used for a single playout. Playouts should be very fast, but the NN needs a lot of calculations.

 

Maybe a small and fast NN can be used for the playouts. They would be slower but of higher quality. This can be better or worse or does not change anything. But my first thought is that random playouts can be done on the CPU so the GPU is free for other concurrent tasks, but playouts with a NN should run on the GPU and reduce the overall performance.

 

It could become interesting to build a hybrid engine, using NN but a fast manually written evaluation function for the playouts (or maybe even SF's complex evluation). That's not what is in the scope of DeepMind, but for the purpose to create the strongest engine ... I don't know, its beyond my knowledge, unfortunately I didn't do anything with NN in practice.

 

Maybe it's time to learn C++ and study the LeelaZero source code. happy.png

Elroch

It is that very reason that on very powerful hardware only 80,000 nodes per second can be examined - 875 times fewer than Stockfish on slower hardware (which itself was a fast 32-core machine). You can be certain an evaluation is required at every step in a playout!

The key is that playouts are not fast, but they are informative. Hence not a huge number are needed.

admkoz
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?

No. It is a simple function with many inputs, as many learnable parameters (+1, actually) and one output.

What are the large number of real valued inputs and how are those inputs derived from the actual board position?

See the paper: it is a very simple representation of the position, nothing more.

What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?

Studying neural networks, reinforcement learning will answer this, but roughly speaking for each sample of positions it evaluates its inaccuracy and corrects the parameters of the neural network in the direction that would reduce that inaccuracy for that particular sample, in the belief that this will generalise to other positions when repeated many times.

How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

Well, it would help to first of all study the things that are in the public domain. Not a huge amount it secret (notably the detailed architecture of the neural network, probably due to IPR issues.

2.9 billion positions, number you threw out above, barely scratches the surface.

Stockfish examined 4,200,000,000 nodes (positions, that may be revisited) per move. AlphaZero examined only 4,800,000!

6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not read an article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

 

Thank you, yes, I know that.  But if all those algorithms already existed and what remained was just to throw the rules of chess at it, plus have hardware capable of implementing the neural network, (all requiring resources beyond what any normal research team would have) that is less impressive than if they made some breakthrough in neural network design.  It seems likely that they did but that that wasn't the point of their paper.