Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
rRossCo

I've gone blind reading this thread! Which is a good thing, I now can't read "solve chess" anymore.

admkoz
Godeka wrote:

@admkoz:
The games are not played completely random. They are played randomly from a specific position to the end. Assume to have K-Q vs K and move the queen on a field adjacent to the opponent’s king. Play randomly 800 games and you get a number of wins, losses and draws.

Repeat it, but this time move the king instead of the queen or move the queen on a field not adjacent to the opponents king. After playing 800 games randomly to the end, you should get more wins and draws than before, so the selected move must be better.

Searching a mate would mean, that you must use brute-force which only works in late endgame. You would need rules to know when brute-force can be used, or you need rules to select moves, or to prune branches, and to evaluate positions. That doesn’t make sense for a NN that should learn without human input, and it is likely that human input weakens the network, even if it is possible that it learns faster in the beginning.

@Legeco:
It does the same mistake again and again, maybe for some 100.000 games. The weights of a NN are adjusted slowly. It is also possible that the NN learns that it is good to play knights to the edges or into corners, because it had success with it. It can take a long time until it recognises that there are better moves.

I just cannot believe that that strategy would come up with a world-champion AI in 44 million games.   You'd play 100K games and you'd barely know how to mate with K-Q vs K.

admkoz
Elroch wrote:

When it is training, generalisation is key.  The neural network provides an evaluation for any position. If it gets a number and then exploring further ahead makes that number look not quite right, the network is slightly tweaked in a general way to make it a bit more consistent. The fundamental idea is that if you do this over and over, the evaluation of all positions gets more like the evaluation when you look ahead, and that has to be a good thing. The network gets better, because it has seen enough variety of positions to generalise to any position it sees.

The fact that the whole of the network is used all of the time means it is not really about specific positions. If you change one of the millions of numbers, it will evaluate all positions a little differently. However, it might be that such a change has virtually no effect in most positions, but a bit of an effect in some class of positions: this is because the significance of parts of any neural network depend on the state of other parts of the network as a natural consequence of the way they are wired.

The basic issue I see here is you're implying that, before anything happened, there was a network with millions of numbers that did some type of evaluation, and all that happened was that these numbers got "tuned".  That is really bizarre because I don't see how it would even know what the structure of the function was that needed to be tuned.  

 

How did you come up with 53?  80K * 0.04 = 3200?  Though even there, 3200 positions does not seem like you'd get very far down the tree in K-Q vs K.  

Elroch
Legeco wrote:

OK but then I'm a little confused about how it determines what works or not (as per my first comment). Is the quality of any single move in any given position simply based on the result of the 20 million games it's played against itself?

Initially it knows nothing.

After it has blundered to the end of some games, the positions that permitted a winning move are now known to have a correct evaluation of either 0 or 1 (depending on which side won). This information allows it to tweak the evaluation routine in the right direction.

As the evaluation routine becomes a bit less random, it can also adjust the network to ensure that if it is possible to play a move from a position then the old evaluation is consistent with the new one.

The simple rule is that if you have the correct evaluation of a position with white to play, this is the same as the maximum of evaluations of the evaluations of the positions after white has played each of his legal moves; if you have the correct evaluation of a position with black to play, this is the minimum of the evaluations of the positions after black has played each of his legal moves.

If the evaluations are inconsistent with this, the network can be tweaked to make it less inconsistent.

The net result of this process is that the network becomes increasingly good at looking ahead without actually doing the evaluation, by a complicated hotch-potch of ad hoc numerical calculations, specified by all the parameters of the network.

Elroch
admkoz wrote:
Elroch wrote:

When it is training, generalisation is key.  The neural network provides an evaluation for any position. If it gets a number and then exploring further ahead makes that number look not quite right, the network is slightly tweaked in a general way to make it a bit more consistent. The fundamental idea is that if you do this over and over, the evaluation of all positions gets more like the evaluation when you look ahead, and that has to be a good thing. The network gets better, because it has seen enough variety of positions to generalise to any position it sees.

The fact that the whole of the network is used all of the time means it is not really about specific positions. If you change one of the millions of numbers, it will evaluate all positions a little differently. However, it might be that such a change has virtually no effect in most positions, but a bit of an effect in some class of positions: this is because the significance of parts of any neural network depend on the state of other parts of the network as a natural consequence of the way they are wired.

The basic issue I see here is you're implying that, before anything happened, there was a network with millions of numbers that did some type of evaluation, and all that happened was that these numbers got "tuned".  That is really bizarre because I don't see how it would even know what the structure of the function was that needed to be tuned.  

 

How did you come up with 53?  80K * 0.04 = 3200?  Though even there, 3200 positions does not seem like you'd get very far down the tree in K-Q vs K.  

Well spotted: that was my error. I was thinking of the 1 minute it had per move in the games, leading to an extra factor of 60 that should not be there. It is indeed 3200 nodes.

Yes, a neural network being trained starts off just producing noise. One way to initialise them is with small random weights. As it learns this is shaped into something meaningful.

It doesn't know the structure of the function, but the network is very versatile so it can approximate almost anything. Indeed a major problem in many applications is that they are too versatile. It is important that the network is forced to find generally useful patterns rather than memorising specific examples (which is only useful in the minority of the game). To force it to generalise better, DeepMind used a standard technique called L2 regularisation, but didn't explain how they tuned this (this is generally necessary for best results).

SeniorPatzer
Elroch wrote:

With the hardware it used, 40 ms was enough to analyse 53 nodes. Even at that speed, AlphaZero was playing near 2800 Elo, which shows how effective its evaluation network is.

 

Let's round 40 milliseconds to 1 second.

 

If your statement is correct, then it plays at 2800 ELO with only 1 second to "think."  Magnus is like 2851.

 

Now with regards to the subject title of this post, what do you would be the outcome of this proposal?  Magnus Carlsen gets the current world chess championship classical time controls.  AlphaZero gets a generous 1 second to respond to each of Magnus' moves.  And just like in actual chess, AlphaZero gets to think on Magnus' time.

 

This is a fairly even match, 2800 vs. 2800.  Who do you think would prevail in a match under these conditions?

 

I would bet on AlphaZero.  Although I'd be delighted to see Magnus win. 

Pawn_Checkmate

have you guys seen the 10 published games?  in every one of them A0 sacrifices material for positional advantage. SF  doesn't even realize that its losing until its too late.   A0 wasn't taught how much each piece is evaluated at. but it figured out when to sacrifice it.   in one of the games it played down with one piece for more than 10 moves. Reminded me of one of Tals sacrifices which the computer initially evaluates it as a mistake only for the evaluation to change some moves down the line.    In my opinion now I believe SF lost because of its lack of positional understanding, like all other conventional chess engines.  

Elroch
SeniorPatzer wrote:
Elroch wrote:

With the hardware it used, 40 ms was enough to analyse 53 nodes. Even at that speed, AlphaZero was playing near 2800 Elo, which shows how effective its evaluation network is.

 

Let's round 40 milliseconds to 1 second.

 

If your statement is correct, then it plays at 2800 ELO with only 1 second to "think."  Magnus is like 2851.

 

Now with regards to the subject title of this post, what do you would be the outcome of this proposal?  Magnus Carlsen gets the current world chess championship classical time controls.  AlphaZero gets a generous 1 second to respond to each of Magnus' moves.  And just like in actual chess, AlphaZero gets to think on Magnus' time.

 

This is a fairly even match, 2800 vs. 2800.  Who do you think would prevail in a match under these conditions?

 

I would bet on AlphaZero.  Although I'd be delighted to see Magnus win. 

Sadly for humankind (not really) AlphaZero would easily win, because most of the time Carlsen's move would be in the top few of its picks, so it could devote a minute or so of analysis to each before he had made it. In other less common cases, it can still play a good move.

(It's funny how there is a tendency to call AlphaZero by a personal pronoun. Is this a sign it is a true AI?)

DarkVoidll

Oh Man both suck.

chesster3145
SeniorPatzer wrote:
Elroch wrote:

With the hardware it used, 40 ms was enough to analyse 53 nodes. Even at that speed, AlphaZero was playing near 2800 Elo, which shows how effective its evaluation network is.

 

Let's round 40 milliseconds to 1 second.

 

If your statement is correct, then it plays at 2800 ELO with only 1 second to "think."  Magnus is like 2851.

 

Now with regards to the subject title of this post, what do you would be the outcome of this proposal?  Magnus Carlsen gets the current world chess championship classical time controls.  AlphaZero gets a generous 1 second to respond to each of Magnus' moves.  And just like in actual chess, AlphaZero gets to think on Magnus' time.

 

This is a fairly even match, 2800 vs. 2800.  Who do you think would prevail in a match under these conditions?

 

I would bet on AlphaZero.  Although I'd be delighted to see Magnus win. 

But why are we rounding 40 milliseconds to 1 second (1000ms)?

Elroch

That is really bad rounding! But rounding to zero would be a bit unfair. wink.png

[Remember my number of 53 nodes was incorrectly based on 80,000 nodes per 1 minute move, when it was actually 80,000 nodes per second, implying 3200 nodes in 40 ms]

SeniorPatzer

"With the hardware it used, 40 ms was enough to analyse 53 nodes. Even at that speed, AlphaZero was playing near 2800 Elo, which shows how effective its evaluation network is."

 

"Near 2800."   Which I understood to be less than 2800.  But indeed rounding 40 milliseconds to 1000 milliseconds did expose how metrically challenged I am, and mathematically challenged I am.  Good catch!

 

So in a revision how about giving AlphaZero 100 milliseconds per move instead since it is not quite 2800 and Magnus is 2851 or so?

 

I would still bet on AlphaZero.

 

Psychologically, it would likely be unnerving for Magnus to play his moves and then almost instantaneously have a reply move on top of his move every time.

SmyslovFan

Basically, even Elroch now realizes that his claim that the engine was only 2800 strength is wrong.

Lyudmil_Tsvetkov

Magnus is much better than Stockfish, not to mention Alpha.

Set the engine on a single core, or give Magnus 1000 times more time, to ensure fully equal conditions, and he will win.

 

What kind of creature is this that plays on 3000-5000 cores?

If the engine uses that big hardware, the human should have 3000x more time.

SeniorPatzer
Lyudmil_Tsvetkov wrote:

Magnus is much better than Stockfish, not to mention Alpha.

Set the engine on a single core, or give Magnus 1000 times more time, to ensure fully equal conditions, and he will win.

 

What kind of creature is this that plays on 3000-5000 cores?

If the engine uses that big hardware, the human should have 3000x more time.

 

I'm not good at math.  What is 1000 times 100 milliseconds?  AlphaZero gets 100ms and Magnus gets 1000 times that for his moves.  

 

2800 AlphaZero vs. 2851 Magnus.  

Pawn_Checkmate

if the machine is allowed to ponder.  Magnus will lose even he gets a day per move. Why compare MC with a machine? Afterall he's just human.  

100ms x1000 equals 100 sec,  not so much diffeeent from Internet bullet

THE_GRANDPATZER

Forgive me if what follows is a stupid comment because I don't understand the technology all that well (if at all). But... Is the neural network part of A0 the most 'human'? in other words, is that the bit that learns from the 44 million games and decides what does/doesn't work? It would be interesting to see how that bit specifically compares to Magnus. I'm not sure how it could be tested though. Maybe A0 vs Magnus + Stockfish? Or A0 vs Magnus + reference to a database containing the same 44 million games (and perhaps no time control for Magnus, or adequate enough time to consult the database fully).

SeniorPatzer
Pawn_Checkmate wrote:

if the machine is allowed to ponder.  Magnus will lose even he gets a day per move. Why compare MC with a machine? Afterall he's just human.  

100ms x1000 equals 100 sec,  not so much diffeeent from Internet bullet

 

Really.  Even one whole day, 24 hours wouldn't even be enough time for Magnus?

 

Anyways, I was thinking of a televised event.  Magnus with the current classical World Championship time controls versus AlphaZero with 100 milliseconds.  I think Magnus gets an increment starting from the first move?

 

That way the games don't last interminably.  I'm guessing that it's 40 moves in 100 minutes for Magnus.  It would be fun to have IM Danny Rensch, GM Yasser Seirawan, and GM Simon Williams live commentate the match.  And then after each game, GM Maurice Ashley will interview Magnus and adjust Magnus's collar and ask about how smooth the game went for Magnus.  

 

A question from Maurice to Magnus after the first game:  "Uh, Magnus, what was it like to play a move, and then almost instantaneously have AlphaZero play its response back to you on the board by the human operator throughout the entire game?

Elroch
SmyslovFan wrote:

Basically, even Elroch now realizes that his claim that the engine was only 2800 strength is wrong.

I did estimate from a graph of relative ratings at different speeds that with less with 1/30 second AlphaZero might be weaker than 2800. However, if it is 130 points stronger than Stockfish, a 550 point reduction in its rating could still be over 2900 with this little time!

Even with this time, the engine can examine 2700 nodes, so it must be a lot weaker with even less time, and certainly very much less than 2800. Moves made by just evaluating 1 ply ahead (just doing a NN evaluation of the position after each legal move) might be only club level, rather than master level, based on extrapolation of the graph. This is a very rough extrapolation, but the curvature of the graph suggest rapid reduction in strength as the time per move gets reduced below 1/30 second.

This emphasises the impressive way in which AlphaZero improves much more with additional time than an engine like StockFish.

admkoz

"It is important that the network is forced to find generally useful patterns rather than memorising specific examples (which is only useful in the minority of the game)."

 

That is true, BUT I think it would learn much faster if it did at least some brute force, especially at the beginning.  I am mainly interested in how the actual learning happened; may be I would need to read up more on neural networks.  I am a programmer but not one that has ever been involved in machine learning.  

 

If it literally never looks ahead, then it plays an entirely random game.  At the end, it has learned.. essentially nothing.  It probably walked past a dozen mates in 1 as the game went on.  Therefore, its evaluation method for all of its moves is wholly worthless.  It would have to do that a lot more than 44 million times to actually learn anything.  44 million is less than the number of possible combinations of 10 moves of a queen on an empty board.  

 

But if it can brute force ahead a couple of moves, then it will never miss a mate in 1 or 2 (probably 3 or 4).  Right off the bat that makes learning a whole lot more efficient.  Plus, it will quickly figure out how to mate with K/Q vs K (I harp so much on that because a lot of material advantages turn into that sooner or later).