Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Lyudmil_Tsvetkov
Elroch wrote:

Why don't you believe that chess players can be quite accurately compared using the Elo system? Is it the same reason you claimed that you could produce an engine with a 4600 rating using your ideas?

What do you mean chess player?

There is a psychological side to it, physical side to it, etc.

Is a computer a player? What computer? 1, 16, 128 or 1024 cores? There is a distinction, is not it?

So that Carlsen has fully the right to take a break.

Lyudmil_Tsvetkov
ponz111 wrote:
Lyudmil_Tsvetkov wrote:

If 1 game is played per day, and then Magnus has a rest day, he would at least draw.

Humans simply get tired at some point.

Magnus would probably lose. He might obtain a draw every once in a while but he would mostly lose.

Magnus is a human and cannot compute near as fast as Stockfish or Alpha Zero. There are other reasons why Magnus would lose also... 

But his positional understanding is much better. That counts alot.

I guess most of the human losses to computers are due to the fact the humans play unprepared.

Look at Nakamura, with rare exceptions(some KIDs he played), he rushes to open the game against the engine. Precisely the way one should not handle a computer.

I guess Carlsen has very good chance against the top engines, if he is prepared and takes sufficient break after each game played.

Current top engines play 2500 positionally, so a lot of leeway to beat them.

Elroch
Lyudmil_Tsvetkov wrote:
Elroch wrote:

Why don't you believe that chess players can be quite accurately compared using the Elo system? Is it the same reason you claimed that you could produce an engine with a 4600 rating using your ideas?

What do you mean chess player?

An agent that plays chess. Humans and computers are the two main classes of examples.

There is a psychological side to it, physical side to it, etc.

There is no such side to chess performance. This is determined by the results against the opposition, which depends on the moves made and the quality of the process used to select those moves. Rather, such factors AFFECT the quality of decisions and the moves made, and these affect the rating.

Is a computer a player? What computer? 1, 16, 128 or 1024 cores? There is a distinction, is not it?

Of course: these are two different players.

So that Carlsen has fully the right to take a break.

He might reasonably insist on conditions that allow him to play at his best. There is no law about this.

The rating of a chess player is a measure of their results (and by implication their quality of play) against other players. The rating is a measure of actual performance, has no psychological aspect, and covers all agents that play chess. [Note that even among humans there are different categories of players - eg Petrosian's type of player plays differently to Alekhine's type - but this is no obstacle to a single rating].

Carlsen might well be able to play a tad stronger in Elo terms if he was rested for every game, but very few would suggest it could be more than, say, 50 points (generously).

There is no such thing as positional Elo, because there is no game called positional chess. There is merely Elo taking into account your ability to steer the game into a position you like (and your opponent's ability to do the opposite). Humans do well to steer a game to positions which may be difficult for computers in games against them, but it is no longer enough.

Elroch
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?

No. It is a simple function with many inputs, as many learnable parameters (+1, actually) and one output.

What are the large number of real valued inputs and how are those inputs derived from the actual board position?

See the paper: it is a very simple representation of the position, nothing more.

What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?

Studying neural networks, reinforcement learning will answer this, but roughly speaking for each sample of positions it evaluates its inaccuracy and corrects the parameters of the neural network in the direction that would reduce that inaccuracy for that particular sample, in the belief that this will generalise to other positions when repeated many times.

How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

Well, it would help to first of all study the things that are in the public domain. Not a huge amount it secret (notably the detailed architecture of the neural network, probably due to IPR issues.

2.9 billion positions, number you threw out above, barely scratches the surface.

Stockfish examined 4,200,000,000 nodes (positions, that may be revisited) per move. AlphaZero examined only 4,800,000!

6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not read an article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

Axis-and_Allies-Fan
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?   What are the large number of real valued inputs and how are those inputs derived from the actual board position?  What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?  How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

 

2.9 billion positions, number you threw out above, barely scratches the surface.  6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not even read any article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

Very Interesting. 

Godeka

@Elroch:

> The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

 

Are you sure? To have heavy playouts means that there must be patterns or rules which determine the probability. Where should the patterns or rules come from? Or do you mean the selection of ways down to leafs? They are selected according to the MCTS algorithm of course (node expansion).

 

Or did I get something wrong?

Elroch

Contrary to an earlier temporary misunderstanding of mine (based on missing information and my own guess about best design), the paper clearly indicates that the single network outputs probabilities for every legal move (which was actually what I guessed earlier still!). This seems rather cumbersome because it requires a lot of outputs (many of which are filtered because of legality), but it clearly works rather well. A big advantage is that one evaluation produces information on all possible moves, rather than doing one evaluation for each position that can be legally reached in one move.

Godeka

As I take it the NN is not used for a single playout. Playouts should be very fast, but the NN needs a lot of calculations.

 

Maybe a small and fast NN can be used for the playouts. They would be slower but of higher quality. This can be better or worse or does not change anything. But my first thought is that random playouts can be done on the CPU so the GPU is free for other concurrent tasks, but playouts with a NN should run on the GPU and reduce the overall performance.

 

It could become interesting to build a hybrid engine, using NN but a fast manually written evaluation function for the playouts (or maybe even SF's complex evluation). That's not what is in the scope of DeepMind, but for the purpose to create the strongest engine ... I don't know, its beyond my knowledge, unfortunately I didn't do anything with NN in practice.

 

Maybe it's time to learn C++ and study the LeelaZero source code. happy.png

Elroch

It is that very reason that on very powerful hardware only 80,000 nodes per second can be examined - 875 times fewer than Stockfish on slower hardware (which itself was a fast 32-core machine). You can be certain an evaluation is required at every step in a playout!

The key is that playouts are not fast, but they are informative. Hence not a huge number are needed.

admkoz
Elroch wrote:
admkoz wrote:
Elroch wrote:
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

That is fine, but not enough info for me to figure out how to get from A (randomness) to B (clobbering stockfish).   Does that mean that a neuron, in terms of software, is basically a thread running some function?

No. It is a simple function with many inputs, as many learnable parameters (+1, actually) and one output.

What are the large number of real valued inputs and how are those inputs derived from the actual board position?

See the paper: it is a very simple representation of the position, nothing more.

What is the initial linear function/non-linear transformation?  How is it updated based on what happens in a random game?

Studying neural networks, reinforcement learning will answer this, but roughly speaking for each sample of positions it evaluates its inaccuracy and corrects the parameters of the neural network in the direction that would reduce that inaccuracy for that particular sample, in the belief that this will generalise to other positions when repeated many times.

How much can we meaningfully discuss this before getting into things that only AZ's programmers know?  

Well, it would help to first of all study the things that are in the public domain. Not a huge amount it secret (notably the detailed architecture of the neural network, probably due to IPR issues.

2.9 billion positions, number you threw out above, barely scratches the surface.

Stockfish examined 4,200,000,000 nodes (positions, that may be revisited) per move. AlphaZero examined only 4,800,000!

6 pieces alone can already achieve 50 billion positions.  Somebody said it used "800 random playouts" to evaluate a position initially.  It is hard to believe that 800 random playouts would achieve even one mate with K and Q vs K, let alone K and R.   Each one would also take forever.  

The playouts are not random in the sense that all legal moves are chosen with equal probability. Rather they are chosen with probabilities related to estimates of how likely they are to be the best move. This gives the necessary compromise between precision and breadth.

One major question I have is what actual breakthrough AZ's programmers made.  Any, or did they just throw more hardware at it than anybody previously has?

No-one has previously done what DeepMind did, even badly!  

Have you not read an article about the subject? AlphaZero was not programmed to play go, chess and shogi except for the rules. It was programmed to LEARN how to play go, chess and shogi by playing games against itself. Amazing as it seems, that is how it became the best player in the world at all three games (chess being by the smallest margin).

The techniques it used to achieve this are general reinforcement learning using refinements of learning algorithms developed by the DeepMind team, coupled with Monte Carlo Tree Search, which involves exploring representative continuations from a position all the way to termination with a result. The knowledge the program acquired was stored in a deep neural network.

 

Thank you, yes, I know that.  But if all those algorithms already existed and what remained was just to throw the rules of chess at it, plus have hardware capable of implementing the neural network, (all requiring resources beyond what any normal research team would have) that is less impressive than if they made some breakthrough in neural network design.  It seems likely that they did but that that wasn't the point of their paper. 

Elroch

Well that's neither true nor makes much sense. Two separate teams at DeepMind (listed in the paper, if I recall) developed the specific algorithms used to generate record-breaking performance, building on work using rather different algorithms (but with some similarity) to achieve world best performance at go. Even without that, achieving the best performance in the world (enormously so at go) is always impressive, and achieving it with a radically different technique to all previous world bests is very interesting.

HorribleTomato

BringBackYeOldThreads!