Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
admkoz
SmyslovFan wrote:

For reference, here is the original paper, which repeatedly states that the engine learned from self-play. It gives charts explaining its analysis of openings. I can't find anywhere in the paper that the games it played are not remembered:

https://arxiv.org/pdf/1712.01815.pdf

The evaluation function would be, in effect, a compression of that information.

Suppose it plays a game with K/Q vs K and figures out that K/Q vs K is always a forced mate, if not currently stalemate and the Q can't be taken.  It would have no reason to remember the entire game.  It can just "learn" that that scenario is always a win.  

 

As to exactly how it "learns" that - that is indeed the question.  

Elroch
mcris wrote:
Elroch wrote:
mcris wrote:

Yes, and if it had opening book or let to play by itself in the opening, it would have lost 0 as Black.

Someone has already pointed out the interesting observation that Stockfish got solid positions out if the openings. The problems occurred later. Playing for a draw is not so easy when you have a 130 point Elo deficit.

That someone is a patzer compared to SF. SF never enters QID without opening book. Also does not play French defense like in the games with AZ.

That someone based their comment on StockFish's assessment of the positions.

riagan

Funny!

Elroch
admkoz wrote:
SmyslovFan wrote:

For reference, here is the original paper, which repeatedly states that the engine learned from self-play. It gives charts explaining its analysis of openings. I can't find anywhere in the paper that the games it played are not remembered:

https://arxiv.org/pdf/1712.01815.pdf

You will find the information you want on page 10, where they say AlphaZero uses none of a set of techniques.

The evaluation function would be, in effect, a compression of that information.

A general evaluation function for all positions is not a compressed opening database any more than StockFish's evaluation routine is. Note that one obvious reason is that the moves selected depend on the random exploration of the future and depend on how much time is used for this exploration.

Suppose it plays a game with K/Q vs K and figures out that K/Q vs K is always a forced mate, if not currently stalemate and the Q can't be taken.

It can never do this, it can merely learn from examples that such positions win with probability near 1. It can probably do a good enough job to always find the fastest win in such endings (my opinion, based on what I know).

It would have no reason to remember the entire game.  It can just "learn" that that scenario is always a win.  

As to exactly how it "learns" that - that is indeed the question.  

It learns everything it knows by model-based reinforcement learning, where the model is a neural network trained using minibatches of positions using a method that is quite similar to stochastic gradient descent in supervised learning (I.e. a pseudo-error is generated for each minibatch and the network corrected by combining the pseudo-error, a learning rate and the gradient of the pseudo-error with respect to all of the neural network parameters.

mcris
Elroch wrote:
mcris wrote:
Elroch wrote:
mcris wrote:

Yes, and if it had opening book or let to play by itself in the opening, it would have lost 0 as Black.

Someone has already pointed out the interesting observation that Stockfish got solid positions out if the openings. The problems occurred later. Playing for a draw is not so easy when you have a 130 point Elo deficit.

That someone is a patzer compared to SF. SF never enters QID without opening book. Also does not play French defense like in the games with AZ.

That someone based their comment on StockFish's assessment of the positions.

As @riagan put it: Funny. But of null value.

Elroch

That's not a very nice thing to say about Stockfish's evaluation!

mcris

As I said, SF does not even get into QID by itself, so you took the losing way of lying. Too bad.

Elroch

Stockfish is likely a little better when given the crutch of an independently generated opening book. However, AlphaZero beat Stockfish convincingly in every opening when equal games were played by each with black and white, with a large plus with white in all, and a small plus in most with black (a small minus in a few only - notably Sicilian and King's Indian. This fact shows Stockfish is not an opponent with whom to be reckless! It is striking how these two defenses are not so popular when the strongest players play each other). There is no major opening where Stockfish plays as well overall as AlphaZero. See page 6 of the AlphaZero paper.

Godeka

@admkoz:
A neuron is a node in the network that has input connections from other neurons, output connections, and a activation function. Additionally a propagation function basically sums up the weighted input values. During learning the weights are adapated, nothing else, not less, and not more.

For details about NN you should look at Wikipedia or watch some YouTube videos. I think it is not possible or useful to go into detail in this forum.

AZ developers (DeepMind) invented nothing totally new. It is not easy to create a NN structure, you need experience, and some parameters can only be determined by testing. DeepMind has the knowledge and maybe even more important it has the resources to compute and test multiple NN settings (and MCTS implementations). And note that it is only research, so you need money. There is no final product you can sell.

NN were used before, to build a network for simple image recognition is easy. (But some decades before the computation power was an issue.) And MCTS was successfully used in Go before (which already was a little bit surprising), but no one knew if a program becomes stronger if MCTS is combined with NN. DeepMind was the first one that has proofed it will. This was only possible because DeepMind had the knowledge, money and resources. And I assume there were multiple problems we don’t think about that had to be solved.

Even if the basic concept is easy, the details aren‘t, I think. We know the first announced version of AlphaGo (2015, Fan Hui) was much, much weaker than versions from 2017. And AlphaGo Fan and the direct successor were trained with Go games played by humans on Go servers, and the training was supervisioned. So I think it was necessary to build up some basic experience about such big NN.

Even then I think it was not obvious and not clear that NN work in chess too. The same is true for self learning from scratch. It is not obvious and there was not much experience.

But DeepMinds aim was and is higher. We had self learning NN playing Atari games, Go, than a generic approach for multiple games. DeepMinds aims on NN for medical tasks, and for sure for other tasks where Google is interested in: better and faster search (DeepMind released a paper about faster searches in B* trees with NN), image recognition, enhancing Google maps and StreetView, language translation ...

mcris

About that: Learning chess engine programmer Mike Sherwin expressed expert oppinion that AZ was trained against SF (on Talkchess).

Elroch

I respect the honesty of David Silver and the Deepmind team.

I doubt Sherwin knows much about the relevant AI technology, but it is a bit odd to ignore the information in the AlphaZero paper.

mcris

The information in the paper has not been reviewed, one figure clearly shows AZ rating converging to the rating of SF and stopping a bit above.

The honesty would be to appreciate if the match would have resulted by a common agreement with SF developers. 

Also the paper shows 1200 games AZ vs SF in different openings. Was the match before those games? If not, it means again that AZ trained against SF.

prusswan

Again, remember that AZ's purpose is to generalize the earlier work on Go and show how it can be quickly retrofitted for similar purposes, even with minimal preparation and optimization. Training against SF has no research value and is ultimately useless for this purpose.

At this point, they already got more publicity from beating the top engine and human players in a more difficult game. They also know from Go that training against specific opponents only limits the maximum potential what they can achieve.

HobbyPIayer
mcris wrote:

 

Also the paper shows 1200 games AZ vs SF in different openings. Was the match before those games? If not, it means again that AZ trained against SF.

Those 1200 games mentioned in the paper were self-play training games, where AlphaZero discovered certain mainline openings through trial and error, and played them against itself.

The win/draw/loss records for each ECO reflect how well the openings performed when AZ played both colors.

HobbyPIayer
prusswan wrote:

Again, remember that AZ's purpose is to generalize the earlier work on Go and show how it can be quickly retrofitted for similar purposes, even with minimal preparation and optimization.

Agreed.

All the arguing about whether or not Stockfish could win with better hardware (or database access) is kind of beside the point. I doubt DeepMind even cares about the AZ vs SF debate, as far as chess goes.

This was more of a demo/test to show the proficiency of their self-learning network, when applied to a game that it knows nothing about (other than how the pieces move).

Elroch
mcris wrote:

The information in the paper has not been reviewed, one figure clearly shows AZ rating converging to the rating of SF and stopping a bit above.

About 130 points above, presumably based on the match games. Its rating rose very little in the latter stages of self-play training, which is consistent with (but does not imply) this being the highest achievable with the specific neural network design and the computational resource used.

The honesty would be to appreciate if the match would have resulted by a common agreement with SF developers. 

Also the paper shows 1200 games AZ vs SF in different openings. Was the match before those games? If not, it means again that AZ trained against SF.

All the Stockfish games were after the end of training - the paper says "We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively". AlphaZero was the same after them as before so it does not mean that it "trained against Stockfish" any more than Stockfish fhas trained against you if you play it. Rather AlphaZero was tested against Stockfish where, in machine learning, testing is the phase used to verify that a finished agent performs well.

Elroch
HobbyPIayer wrote:
mcris wrote:

 

Also the paper shows 1200 games AZ vs SF in different openings. Was the match before those games? If not, it means again that AZ trained against SF.

Those 1200 games mentioned in the paper were self-play training games, where AlphaZero discovered certain mainline openings through trial and error, and played them against itself.

The win/draw/loss records for each ECO reflect how well the openings performed when AZ played both colors.

Sorry, but no. If that were so, it could not have a plus score with both colours. Those games were test games of the fully self-trained AlphaZero against Stockfish. There is separate data in the paper on the openings AlphaZero chose as its self-training had progressed.

mcris

What is fully self-trained?  At this stage AZ solved chess? I don't think so.

In the paper is clearly stated that according to Fig. 1, it took 30k steps to surpass SF rating. But why not further increase of strenght for the next 40k steps?

admkoz

AZ hasn't solved chess, it is just better than anybody else at the moment.  No guarantee that lasts forever.

It might be a fun, John Henry-esque project to see if we could build up Stockfish to beat the current incarnation, but in the end, they would just throw more hardware at AZ and that would be that. 

admkoz
Elroch wrote:
admkoz wrote:
SmyslovFan wrote:
 

The evaluation function would be, in effect, a compression of that information.

A general evaluation function for all positions is not a compressed opening database any more than StockFish's evaluation routine is. Note that one obvious reason is that the moves selected depend on the random exploration of the future and depend on how much time is used for this exploration.

 

Philosophical, but I think it is.  If I tell you the ordered pairs (1,1), (2,2), (3,3), (4,4), (5,5), that's 10 numbers you have to remember.  If instead I just tell you y=x, that's a lot less to remember, but it's still basically the same information.

 

For that function to always be able to tell you y given x, we have to be able to assume that all future values will follow the same pattern as the ones you actually saw.  Apparently that assumption was valid here, and I'd still like to know why, since for every position it looked at there are 1 with over 30 zeroes positions that it didn't.  I have to think part of the reason is path dependence, in that it has to be a position that can reasonably be derived from the opening position.  For that reason I'd really be interested in seeing it play SF in Chess960 happy.png 

 

Anyway - going to spend a little time reading up on NN... seems like that's about what I need to do at this point..