Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
admkoz
Elroch wrote:

This is a very rough extrapolation, but the curvature of the graph suggest rapid reduction in strength as the time per move gets reduced below 1/30 second.

 

If it had 40ms to think during training, then let's say it spotted a forced win from a given position.  It would make logical sense for it to have that position simply evaluate to a win.  It knows that if it is ever in that position again, it will be able to find the forced win the same way it did before.....

 

...unless it has even less time to think than it did during training.  Then it's screwed.  

 

This can't be the complete explanation since obviously it is not brute force but it seems unlikely to be a coincidence that 1/30th of a second is so close to the think time it had during training.  

Elroch

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

Elroch
admkoz wrote:

"It is important that the network is forced to find generally useful patterns rather than memorising specific examples (which is only useful in the minority of the game)."

 

That is true, BUT I think it would learn much faster if it did at least some brute force, especially at the beginning.  I am mainly interested in how the actual learning happened; may be I would need to read up more on neural networks.  I am a programmer but not one that has ever been involved in machine learning.  

It's safe to say that the DeepMind team know a lot about how to do this and have achieved the most impressive success in this type of application, It's partly about the development of model-based reinforcement learning algorithms and partly about the novel version of MCTS.

If it literally never looks ahead, then it plays an entirely random game.  At the end, it has learned.. essentially nothing.  It probably walked past a dozen mates in 1 as the game went on.  Therefore, its evaluation method for all of its moves is wholly worthless.  It would have to do that a lot more than 44 million times to actually learn anything.  44 million is less than the number of possible combinations of 10 moves of a queen on an empty board. 

I don't know the details of the algorithms they used, but the basic idea is that information propagates from future positions to past positions and improves the evaluation of past positions. As I have pointed out, there seem be exactly two forms of this. The first is the propagation of the actual results of games: these tell you something about previous positions with the depth to which the information propagates depending on how accurate you can expect the recent moves to be, The second is that past evaluations should be compatible with future ones. If you think a move is great, play it and then get clobbered (according to your imperfect but meaningful evaluation function) then you can conclude your evaluation of the previous position was over-optimistic. The network is adjusted in a way that carefully weights that conclusion (the ubiquitous gradient ascent technique is what is used to do this). 

But if it can brute force ahead a couple of moves, then it will never miss a mate in 1 or 2 (probably 3 or 4).  Right off the bat that makes learning a whole lot more efficient.  Plus, it will quickly figure out how to mate with K/Q vs K (I harp so much on that because a lot of material advantages turn into that sooner or later).  

While that may be so, you are using a lot of resources if you do this to much depth. You may find you learn more quickly by being less broad once your evaluations are meaningful. Even at first, you may learn a lot more by looking at a wide variety of positions in a limited way than a smaller number with expensive brute force search. Note that the algorithms certainly include a keenness to look at novel positions and not to be too narrow in what is analysed.

 

Godeka
admkoz wrote:

If it literally never looks ahead, then it plays an entirely random game.  At the end, it has learned.. essentially nothing. [...]

 

Who said it does not look ahead? MCTS is used for that. The random games are the playouts for evaluating a position, nothing else. Classic MCTS:

  1. Selection
  2. Expansion
  3. Playouts (evaluation)
  4. Backpropagation

 

Using brute-force games for evaluation means you have to play brute-force games until the end. Remember: an evaluation functions specified by a human is not wanted.

 

The main difference is: it is possible to calculate one brute-force game in one billion of years, or you can calculate one billion of random games in a second. grin.png

Lyudmil_Tsvetkov

I have seen the 10 games, but you have not.

The 10 games feature:

- 5 repeated games, 4 QIDs, where Alpha pushes d5 on the 6th or 7th move, one 2 Ruy Lopez Exchange, where SF gives up its bishop pair for nothing on the 5th move, and 2 hopeless Advance French

- 6 Bg2 fianchettos of Alpha with white, 0(please, note well), Bg7, with black, 0 Bg2/Bg7 fianchettos by SF, certainly, because it is weak

 

How does Alpha know Bg2 is good/the best move?

Obviously, by using an opening book/human knowledge.

Why does Alpha play Bg2 only with white, but not Bg7 with black?

Obviously, because human databases tell it Bg2 scores well for white, but not Bg7 for black in KID and Gruenfeld. Tell me after that it did not use an opening book.

 

The Alpha team claims they are doing some NN, neural network. Forgive me the ignorance, but what kind of patterns/NN is playing Bg2 with white, but not Bg7 with black?

 

Full nonsense, this completely makes no sense.

I previously thought Alpha is around 2800 on single core, but now I am willling to reassess my estimate to 2400 or so.

The engines that previously tried NN, Giraffe and Romi, on standard PCs, are precisely 2400.

 

So that, huge BS, they are selling us something that actually does not exist, I am not willling to buy into it, it was all hardware(50/1 or even more), opening preparation and very unfavourable SF settings.

Godeka

I found nothing about Romi and neuronal networks. Seems to have alpha-beta search only and saves winning lines.

Lyudmil_Tsvetkov

Visit Talkchess to know more.

admkoz
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion happy.png

Elroch

It is not a sudden change: rather it is that the value of doubling time decreases with strength (and is always more than for Stockfish).

Here is the information I used: a graph from the DeepMind paper. While I said roughly 1/30 of a second, the leftmost data point is probably 40ms (I had to estimate from the logarithmic scale, but you can see it is clearly significantly more than 10^(-1.5) seconds ~= 32ms, more like 10^(-1.4) which is almost exactly 40ms).

null.

Lyudmil_Tsvetkov

Alpha's training 5000 TPUs cost 35 million.

SF's 32 core costs around 5000.

Make the conclusions yourselves.

Elroch

Firstly, Stockfish relied on the cumulative experience of the chess world. I am not sure how many billions that cost. 😀 AlphaZero used knowledge of the rules of chess, which costs very little.

Secondly, I know of no clear evidence that there is any way to make Stockfish as strong as AlphaZero as it was. There is a scalability problem with conventional engines, whose algorithms don't parallelise well.

Elroch
admkoz wrote:
Elroch wrote:

I am not sure what supposed co-incidence you are referring to, but AlphaZero has no way of explicitly remembering a position. Rather, it evaluates every position by finding the state of the neurons in its net, and it is fair to say the evaluation arising from those neuron states is based on the evaluations of other positions that has somewhat similar states for the neurons. (At the bottom level, this might mean the same pieces on certain squares, but at a higher levels it might include much more abstract concepts).

You said it dropped way off with less than 1/30th sec think time, which is 33ms, about = the 40ms it had during training.  That's the coincidence. 

 

As far as the part about neurons, I think that 'neuron' is a metaphor here and for me to discuss it usefully I would need to know what it actually represents in terms of software and how its state is driven by the state of the board.  Probably beyond the scope of a blog discussion

Actually, neurons are easy to describe. They have a large number of real-valued inputs, combine these inputs with a linear function (determined by a parameter for each input) and then apply a fixed non-linear transformation (the activation function) before outputing a single real-valued output. The way to think of them is of combining lots of pieces of information to estimate some other piece of information. Other than the final outputs of a network, the neurons are not told by the designer what information to produce, they are just the inputs to other neurons.

Lyudmil_Tsvetkov

Is not 'The Secret of Chess a nice example for a neural network?

https://en.chessbase.com/post/the-secret-of-chess

 

Instead of using 50/100 terms to evaluate chess positions, you use 10 times more.

Godeka
Lyudmil_Tsvetkov wrote:

Alpha's training 5000 TPUs cost 35 million.

SF's 32 core costs around 5000.

Make the conclusions yourselves.

 

Where do you have the information from? The TPUs cannot be bought, and I found no specification of the used cores for SF. The only information we have is that 64 threads were used.

    For playing AlphaZero used only four TPUs v2, which are very power efficient processors. Compare this with AlphaGo Lee which used 1202 CPUs and 176 GPUs. What an improvement!

    If you want to compare costs or power consumption you should compare the 5000 TPUs with the development effort, calculation of endgame database, opening book etc. But we do not have any information how many NN were trained by DeepMind. I assume the power consumption and costs were enormous, which is one reason why DeepMind couldn’t do it by itself but needed Google. AlphaGo Fan is over two years old, then came AlphaGo Lee and some variations of AlphaGo Master, AlphaGo Zero and then AlphaZero. I assume the TPUs were used all the time.

Elroch

DeepMind is owned by google. They bought the company, so it's not a matter of them calling on google for resources.

Elroch
Lyudmil_Tsvetkov wrote:

Is not 'The Secret of Chess a nice example for a neural network?

https://en.chessbase.com/post/the-secret-of-chess

 

Instead of using 50/100 terms to evaluate chess positions, you use 10 times more.

I don't get your point. The book (based on the review) seems to be about a handcrafted (possibly linear) evaluation function, rather than a non-linear one learnt from data by a machine learning algorithm.

Regarding representing a position, it's easy to do this very compactly. What is difficult is defining the derived quantities that result in the world's best chess player. Simple methods are not going to be enough, even if very useful in their own right.

Aaron0608

hoihbbvluc

ponz111

Magnus is not a patzer compared to any machine. This is because it makes no sense to compare a human to a machine. [apples and oranges]

Can our best runners be compared to a car? Do we diminish the abilities of runners because a car can run faster and longer?

Magnus is the best of all humans at chess--maybe the best player in history?

Prayerman46

Usain Bolt is the world's fastest human. Is that distinction diminished by the fact that he cannot outrace a machine, like a car?  Carlson is the best human chess player. Give him credit for that!

Lyudmil_Tsvetkov
Elroch wrote:
Lyudmil_Tsvetkov wrote:

Is not 'The Secret of Chess a nice example for a neural network?

https://en.chessbase.com/post/the-secret-of-chess

 

Instead of using 50/100 terms to evaluate chess positions, you use 10 times more.

I don't get your point. The book (based on the review) seems to be about a handcrafted (possibly linear) evaluation function, rather than a non-linear one learnt from data by a machine learning algorithm.

Regarding representing a position, it's easy to do this very compactly. What is difficult is defining the derived quantities that result in the world's best chess player. Simple methods are not going to be enough, even if very useful in their own right.

Yeah, but Alpha with their NN on a single core will play around 2400-2800(more likely the first estimate, my initial suggestion of larger elo might have been wrong), SF with their NN(the eval framework) on single core plays at 3200, and, if all the parameters of 'The Secret of Chess' are incorporated into a new engine and properly tuned, it should play at around 4500 or so.

Make the conclusions yourself how much each NN is worth.