Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
Elroch
admkoz wrote:

I also never quite get the "probability" part of it.  I guess it's a question that can only be answered empirically and empirically, AZ won, so... But I can definitely think of positions where any type of probabilistic analysis would be a fail.  Let's say I hang my queen.  He can take the free Q, but if he doesn't, I have mate in 1.  So if you "average" over all his moves, this'll look pretty good since for all but one of them, you win.  Sadly that one is the one he'll pick, so you lose.  

That is not the appropriate sort of averaging. Given the evaluation routine, you can compare all of the possible moves a player can play. Each move has a superficial expected result, a value between 0 and 1 (in go, this is literally the probability of winning, in chess it is almost so, because draws count as half a win).

Given those evaluations, you have a quantification of how likely different moves are to be best. When you do the averaging, you take this into account. You take account of different moves, but weighted by how likely they are to be best. Moves that appear very unlikely to be best you don't take much notice of (but say one has 0.1 chance of being best, you might well explore it and re-evaluate this if the evaluation improved after further analysis).

At least that's the way I would do it (and have done with a different type of stochastic system not to do with games).  

Another thing I would really, really love to see: the first few random games.  In the 40ms of thinking time, did it look ahead, see a non-zero chance of scholar's mate if 1 e3 or e4, so play that?  Or did it just go totally random with 1 a3?

It is safe to say that AlphaZero was very bad when it started to play! It literally knew the rules, but had not the slightest idea whether a move was good unless the game happened to end by chance. At least that is my understanding based on the statements in the  DeepMind paper. But it learned fastest early on, so it was not so long before it was pretty good.

It would learn quickly about how to avoid getting mated in a few moves (There are 8 games that get mated in 2 moves, which is about 1 in a 1000 of the games after 1.5 moves, so there would be a big red flag in its neural network for the characteristics of the immediate history).

As AlphaZero used minbatches of 4096 positions for training, it may well have played 4096 games against itself simultaneously. So it would play a move in each and then modify its network based on the extent to which the evaluations were a surprise. It took a few hundred thousand such steps to get to be probably the world's strongest chess player (some disagree about this status), which would be say a few thousand times 4096 games averaging 100 moves (or whatever).

 

SeniorPatzer
[COMMENT DELETED]
SeniorPatzer
SeniorPatzer wrote:
vmici07 wrote:
This discussion is stupid...

 

No way.  Here's a couple of wonderful nuggets of incredible observations:

 

(1)  Not great enough for Stockfish to have a chance against a program that was delightfully anti-materialistic in several games. Persistent, winning positional advantage at a material cost was a theme of the most striking AlphaZero wins.

 

(2)   "But I have suddenly realised a good reason why this may be so. AlphaZero's evaluation is correctly probabilistic: it gives an expectation of the score. This means it makes a lot more sense to combine such evaluations by averaging, as long as you have an unbiased sample of what may be the future best line.

 

By contrast, Stockfish has some sort of pseudo-material evaluation of positions. It makes a lot less sense to average these, because what matters is your chance of winning, not the expected amount of material you are ahead."

 

These are marvelous insights!  And I think it's a great help to us human players!  With regards to the second point above, this may have great explanatory scope to Emanuel Lasker's great success over a long period of time.  He did not play according to a mathematical "CAPS" score for the most accurate move, he played moves that he thought would pose the most problems to his opponent (whose games and style he studied).  I'm guessing he used some sort of intuitive version of "expectation" for his candidate moves, and then selected those candidate moves which (although they might not be objectively best) confounded his opponents the most, thus yielding the most "expectation" of best  results.   Pragmatism.

 

Elroch

You are very kind. It's good that at least occasionally I say something that is not boring!

TobusRex

Magnus is a fish compared to me, let alone those other guys. And God help him if he ever stepped on the wrestling mat with me.

Godeka
admkoz wrote:

Another thing I would really, really love to see: the first few random games.  In the 40ms of thinking time, did it look ahead, see a non-zero chance of scholar's mate if 1 e3 or e4, so play that?  Or did it just go totally random with 1 a3?

 

It plays randomly and therefore is non-deterministic. Maybe it checked e3, evaluated it by doing some random playouts until the end of the the game, and thought it is good (some 'random' checkmates) – or maybe it thought the opposite of it, or didn't checked e3 at all.

    But the playouts are random or nearly random in any case. You can have very fast weak playouts which are completely random, or you can have a little bit slower hard playouts using some basic logic, for example taking some patterns into account, or using a very basic evaluation function. Or in other words: weak playouts a random, hard playouts are a little bit less random. This is a design decision made by the programmer.

    To evaluate a move, the NN is asked to give its winning probability and some playouts are made resulting in a number of wins, draws and losses. If the move is promising, it will by analysed further by checking sub moves.

    Which moves are selected for evaluation depends on the NN too, it gives the best moves that should be considered to be analysed. It is very human like: humans know too which moves are worth to analyse and which are not. It is highly selective and this is one of the reasons why AlphaZero plays strong by analysing 80kn/s while SF analyses 70.000kn/s in the same time. The other reason is that the evaluation is much better (asking NN about winning propability and doing random playouts).

 

By the way: I don't know why, but for commentary 178 my account was muted. You can go back to page 9 to see it, if you are interested in some performance comparisons.

Elroch
Godeka wrote:

By the way: I don't know why, but for commentary 178 my account was muted. You can go back to page 9 to see it, if you are interested in some performance comparisons.

I have no idea what you mean by this, nor any reason. Thanks for pointing it out.

Godeka

Well, you are right, I wouldn’t understand my own paragraph. wink.png

Commentary 178 wasn’t visible because my account was muted. Then the support unmuted it, and I wanted to point out that additional text appeared above your and admkoz’s last answers.

Elroch

Godeka, it is perfectly reasonable to infer that AlphaZero would be weaker with slower hardware: we have a graph of how its rating changes with reduced computation time. However, we don't have the same evidence of a lot of potential for improving the likes of Stockfish by increasing their computational resource (for the same reason: the graph shows its performance is flattening off dramatically with more computation.

The reason is probably that adding another ply to search is very much a case of diminishing returns, and for StockFish, a doubling of speed will roughly speaking always provide the same addition to the ply of search. For AlphaZero, it's more a matter of broadening search, since each random self-play game has a very similar computational cost.

admkoz

But it got to think for 40 ms, and I assume think means use MCTS with (initially) a random evaluation function.  With the hardware it was using, even completely randomly, 40ms should be enough to exhaustively search a few ply deep.

 

This point was bugging me because it seems like AZ would learn literally nothing from a completely random game, certainly not enough to where it would only take 44 million games to become the best chessplayer ever.  You take a hundred moves, have K-Q vs K, have mate in 1, but you move randomly, drop the queen, and learn - what? Pretty much nothing.  

 

But with 40ms of think time, and the kind of hardware it was using, you could find the forced mate with K-Q vs K.  Therefore, right away, you know that if anything has a forced line leading to K-Q vs K, it's a win.  That probably lops at least a zero off the number of positions it has to look at.  Etc., etc, etc.

Elroch

With the hardware it used, 40 ms was enough to analyse 53 nodes. Even at that speed, AlphaZero was playing near 2800 Elo, which shows how effective its evaluation network is.

admkoz

I mean during the "training" period, though.  The "evaluating 80K positions per second" was during the actual match vs Stockfish.   At the beginning of training, I'll bet the evaluation took shorter to execute since it was basically "make something up".  

Elroch

No, actually. The same large network would be used for evaluations, and immediately this has learnt anything, the cost of an evaluation is the same. MCTS is necessary from the start because literally the only absolute feedback in chess is the results of games (traditional chess engines regard this as too difficult to deal with with and make assumptions about the value of material and positional factors so that they can get feedback without playing a game to completion.

As the evaluation network gets trained, it becomes an excellent replacement for the hardwired evaluation routine, but MCTS still proves an excellent look ahead  technique.

THE_GRANDPATZER

Sorry if this has already been answered, but does anyone know how A0 determines for itself what works and what doesn't? Let's say it plays a certain move on move 30 and goes on to lose that game. Then let's say it plays exactly the same moves again up to move 30 but plays a different move and draws or wins. does it consider move 30 to be superior due to the end result or how else does it learn from this experience? What if it wasn't move 30 that was the problem in the first game but another move (or series of moves) that led to the loss? How does it determine this?

Elroch

I think all AlphaZero's training was done before the matches with StockFish. If so, it could not change its opinion at that time, and the games to which you refer would have to be the self-play ones.

When it is training, generalisation is key.  The neural network provides an evaluation for any position. If it gets a number and then exploring further ahead makes that number look not quite right, the network is slightly tweaked in a general way to make it a bit more consistent. The fundamental idea is that if you do this over and over, the evaluation of all positions gets more like the evaluation when you look ahead, and that has to be a good thing. The network gets better, because it has seen enough variety of positions to generalise to any position it sees.

The fact that the whole of the network is used all of the time means it is not really about specific positions. If you change one of the millions of numbers, it will evaluate all positions a little differently. However, it might be that such a change has virtually no effect in most positions, but a bit of an effect in some class of positions: this is because the significance of parts of any neural network depend on the state of other parts of the network as a natural consequence of the way they are wired.

THE_GRANDPATZER

Interesting. When you say the neural network provides an evaluation for any position, what is that evaluation based on? is it similar to how Stockfish evaluates positions or something different?

Pawn_Checkmate

it's evaluation is based on the millions of games it played itself.  circa 20 million games

 

RubenHogenhout
mickynj schreef:
coldgoat wrote:

chess computers cheat because they have access to opening books and opening databases

 If you read the articles you will find that AlphaZero specifically used no opening books or tablebases

 You can see this just as there memory, because it is just a part of the program. Strong people players that can memories many opening and endgames also have extra information then weaker players that do not know them. So why for computers you can not also see it as that?

 

THE_GRANDPATZER

OK but then I'm a little confused about how it determines what works or not (as per my first comment). Is the quality of any single move in any given position simply based on the result of the 20 million games it's played against itself?

Godeka

@admkoz:
The games are not played completely random. They are played randomly from a specific position to the end. Assume to have K-Q vs K and move the queen on a field adjacent to the opponent’s king. Play randomly 800 games and you get a number of wins, losses and draws.

Repeat it, but this time move the king instead of the queen or move the queen on a field not adjacent to the opponents king. After playing 800 games randomly to the end, you should get more wins and draws than before, so the selected move must be better.

Searching a mate would mean, that you must use brute-force which only works in late endgame. You would need rules to know when brute-force can be used, or you need rules to select moves, or to prune branches, and to evaluate positions. That doesn’t make sense for a NN that should learn without human input, and it is likely that human input weakens the network, even if it is possible that it learns faster in the beginning.

@Legeco:
It does the same mistake again and again, maybe for some 100.000 games. The weights of a NN are adjusted slowly. It is also possible that the NN learns that it is good to play knights to the edges or into corners, because it had success with it. It can take a long time until it recognises that there are better moves.