A 3000 could easily beat a 2000, but could a 4000 easily beat a 3000?

Sort:
Elroch
SmyslovFan wrote:
Elroch wrote:
SmyslovFan wrote:

Grischuk is actually a better example. In one Candidate's match-play tournament, he strove for draws as White and often got the draw in under 25 moves. His strategy worked in the shorter matches, but when he faced Gelfand in a longer match he lost.

 

A 2800 player intent on drawing is very difficult to beat with perfect play. And less than perfect play runs the risk of actually losing.

 

All of the +2800s have several games where they didn't make any mistakes at all.

Presumably that's "no mistakes at all" according to the engine that got 36% against AlphaZero? I have blitz and maybe even bullet games that meet that threshold (although this relies on ignoring small evaluation differences that are very sensitive to computation time, not to say that larger ones never are).

 

A 2800 player might well be able to force a draw almost always against a 2800 player, based on all of their choices being based on minimising the opportunity for the opponent to get winning chances. If they can manage it against a 3400 player, they are not a 2800 player, they are a 3400 player. Grischuk is not a 3400 player.

AlphaZero did not test itself against a fully functioning Stockfish. Stockfish was not optimized the way it is for the CCRL ratings. But let's agree that the version used was ~3300 and that AlphaZero scored 64%-36%. That's a 100 point difference. AlphaZero didn't dominate by 1000 points, or anything close to that. The vast majority of the games played in the match were drawn, and AlphaZero only published its wins. 

No-one has justified the claims about the version of Stockfish used in the research being significantly handicapped. This would be easy to do if true: just match it up against an optimised version and smash it.

It is true that there should be potential for ALL players to increase their standard of play by allocating more time on moves which demand it. However, this was the same for both players. The same is true of opening books, but these are really a crutch for an engine that is bad at finding good moves in the opening (which is really just the part of the non-endgame which has been seen in previous games). Picking a move for an opening book is a kind of cheat that uses lots of previous computing time by other players to provide assistance.

The 1 Gb hash table is argued to have been a huge handicap. However, the only research I can find on this suggests that increasing hash tables to huge sizes has an inconsistent and small effect on performance. Again this could be easily checked by any person who wants to and has a 32-core machine handy to do the comparison. 

It's worth finally mentioning that the processor for the match for Stockfish was an unusually powerful one for computer matches. Alphazero's was more powerful in terms of raw operations, but not at all suitable for running programs like Stockfish - it's designed for general purpose AI.

The AlphaZero test was just a test. It wasn't rated by anyone. It's a very impressive test, and AlphaZero has shown some tremendous improvements over Stockfish. But it has not altered reality. It has not shown that chess is a win, and it has not shown any hint that a computer could reach 4000 elo. If anything, it demonstrated that there's a lot of space between 3300 and 3600. It probably performed at about 3400 strength, but we just don't know.

The space between 3300 and 3600 is defined by Professor Elo. wink.png

 The graph of improvement of Stockfish posted above does not suggest an imminent ceiling of performance.

There is a problem with the ratings scales for comparison. Strictly speaking for humans, there is a single rating scale for a specific time control. For computers this can be the same, or it can be further restricted to specific hardware. This would make it impossible to compare any computer with special hardware, which is obviously not good enough. Better to have a rating for a specific (but unrestricted) combination of hardware and software at a given time control. This also acknowledges the fact that increasing hardware power makes the strength of computers advance as well as improving software.

The annoying thing is that the rating scale for computers is not firmly related to the one for humans. In principle all it requires is a continuum of competitive play between the two populations, so their ratings stay consistent. Part of the uncertainty is due to people being loose about the definition of computer ratings, forgetting that the time control and the hardware are crucial as well as the software.

llama

What do you mean they're "claims."

It's a fact the version wasn't the latest, it's a fact they used a nonsense time control, and a fact they gave it 64 threads but 1 GB hash. A completely untested and unusual setup.

It was publicity for their project, which was an AI project, not a chess project.

If they were interested in a chess match they could have done it under normal conditions, but this will never happen, because that wasn't the point.

 

In fact I can only assume they did tests with various versions of SF, at various time controls and hash sizes. Apparently they didn't think the results of A0 would look good enough if they'd done a normal match.

Elroch
Telestu wrote:

What do you mean they're "claims."

It's a fact the version wasn't the latest, it's a fact they used a nonsense time control, and a fact they gave it 64 threads but 1 GB hash. A completely untested and unusual setup.

It was publicity for their project, which was an AI project, not a chess project.

If they're interested in chess match they could do it under normal conditions, but this will never happen, because that wasn't the point.

There is nothing "nonsensical" about 64 thread minutes per move. This is a generous but arbitrary time control (all others are arbitrary too). It was selected to be relatively large for high quality of play.

The "claims" are that the choices had a radical effect on the results. This is unproven, but could be easily tested as I explained. Either no-one has tried, or anyone who tried kept quiet about the results.

You are definitely wrong with the conspiracy theory about trying other time controls first, because the choice of time control was for all the experiments which started with Go and also included Shogi.

llama

I mean really... these people are not stupid.

Do you think an old version of SF at an odd time control and completely untested 64 threads was a mistake? Do you think they just drew variables out of a hat and said "lets go with that." Of course not.

llama
Elroch wrote:
Telestu wrote:

What do you mean they're "claims."

It's a fact the version wasn't the latest, it's a fact they used a nonsense time control, and a fact they gave it 64 threads but 1 GB hash. A completely untested and unusual setup.

It was publicity for their project, which was an AI project, not a chess project.

If they're interested in chess match they could do it under normal conditions, but this will never happen, because that wasn't the point.

There is nothing "nonsensical" about 64 thread minutes per move. This is a generous but arbitrary time control (all others are arbitrary too). It was selected to be relatively large for high quality of play.

The "claims" are that the choices had a radical effect on the results. This is unproven, but could be easily tested as I explained. Either no-one has tried, or anyone who tried kept quiet about the results.

64 thread minutes... with 1 GB hash. Come on, lets be serious.

 

I don't think anyone is saying the results would be radically different. Maybe SF would have gained ~100 Elo. Meaning A0 would have won the match by a game or two. Not very good publicity.

Elroch

The version of Stockfish they used was the most recent release version at the time of the research which takes time to do and publish. It would be inappropriate to use a beta version because of the possibility of undetected flaws.

Of course newer versions of Stockfish will get better. AlphaZero has no record against them. At some point it will be fair to judge that some new version of Stockfish is a stronger player than AlphaZero was. Not for a while though.

100 points I would call radically better. This is quite a large advance in engine strength.

It is nice that AlphaZero was the best chess player (as well as the best go player by a million miles), but it is not very important to the research. Whether or not it beat the best handcrafted engine available at some point in time, there will be some engine that is better in the future. The achievement was that it learnt to be that good all by itself, with literally no chess knowledge being provided (beyond the rules - it could have even learnt these by trial and error if they had wanted!)

llama
Elroch wrote:

You are definitely wrong with the conspiracy theory about trying other time controls first, because the choice of time control was for all the experiments which started with Go and also included Shogi.

Yes, 1 minute byo-yomi is a common go time control, but practically unheard of in chess.

Again, these people are not idiots. They did their research, made their product, and published results.

elky_plays_chess
HungryHungry wrote:

If Zeus were playing Magnus Carlsen, he would whoop his ass right?

I'd say so, but what I immediately thought of was this 

https://www.etsy.com/listing/203614985/ceramic-handmade-chess-set-greek-gods-of

llama

Anyway, I'll say again, AZ probably would have won anyway, and of course the important result was for AI, not for chess.

It's just annoying how some semi-literate players seem to think AZ dominated SF, but $20 million hardware barely beating the equivalent of what most of us have on a laptop (because we include things like opening books, EGTBs, appropriate hash settings and so on) is not so impressive.

What is tremendously impressive is after being taught the rules, it learned the rest on its own.

SmyslovFan

Elroch wrote, "The space between 3300 and 3600 is defined by Professor Elo." Yes, that is the mathematician's answer. For a chess player *and* statistician such as Kenneth Regan, it's the space between where we were about two years ago and perfection. It seems like a small number until one studies the chess involved. 

EndgameEnthusiast2357

The higher the ratings, the more complicated the winning tactics are gonna be to understand.

JayeshSinhaChess

Its an interesting question and I feel the answer is yes.

 

Those who disagree have done so on the ground that they feel that in every position there is a top move if you keep making the top move you will atleast draw.

 

However what they fail to understand is that the understanding of what the best move is, is not definite.

 

Even in the 2000 vs 3000 scenario, the 2000 guy is trying to make the top move. Its not like he says lets play a terrible move and lose. However what seems like top moves to him not really are the top moves.


Similarly the same thing with 3000 vs 4000 player. What seems the top move to the 3000 guy, may not actually be the top move. The 4000 guy would spot something better.

 

The assumption that the top move is always set is wrong. Its not a formula like 2+2 = 4 no matter whether you are in 1st grade of doing your phd. In elementary sums, sure once you learn basic addition you could take on even a professor and atleast draw.

 

However top moves in a position are not so definite as that. They are only down to the understanding of the players and the understanding of the 4000 guy will be higher than the 2000 guy.

 

Take A0 vs Stockfish. Stockfish was destroyed. As we keep moving higher up the guy with the better understanding always wins.

EndgameEnthusiast2357

Well if all the googolplexians of games could be calculated, we would know exactly what the best moves would be in every possible position. This doesn't however, include an engines analysis of a position that was not reached in a game after perfect moves, but a set-up position. Logic is needed for that, not just numbercrunching and opening books. Endgame Tablebases are only up to 7 pieces.

SmyslovFan

If you aren't talking about chess, but just the ratings in abstraction, your feeling is right,. Objectively, a 1000 point rating difference is the same  regardless of the rating s being compared. That is how Elo is set up.

 

But chess has an upper limit, generally agreed by statisticians as being around 3600. So a 4000 Elo is probably not possible in chess.

EndgameEnthusiast2357
SmyslovFan wrote:

If you aren't talking about chess, but just the ratings in abstraction, your feeling is right,. Objectively, a 1000 point rating difference is the same  regardless of the rating s being compared. That is how Elo is set up.

 

But chess has an upper limit, generally agreed by statisticians as being around 3600. So a 4000 Elo is probably not possible in chess.

So chess engines not being able to solve those long puzzles I posted are an abstraction, not normal chess? I disagree. A perfect engine should be able to solve all of both normal games, and find perfect moves from every position that is simply imputed into it.

llama

re: the limits of chess ratings, maybe the drawing margin should be mentioned again.

Notice that in most endgames, a small material deficit is not enough to make the game decisive, and in most positions many moves are sufficient to maintain the draw. A strong engine would only have to choose one of them to avoid a loss.

It's not hard to imagine a very strong engine making what amounts to educated guesses holding a draw against a perfect player.

llama
Telestu wrote:

It's annoying how some semi-literate players seem to think AZ dominated SF

 

JayeshSinhaChess wrote:

 Take A0 vs Stockfish. Stockfish was destroyed.

 

Greatest_hokage1998

Hold on. I lost my IM titled. Oh, never had it.

EndgameEnthusiast2357
Telestu wrote:

re: the limits of chess ratings, maybe the drawing margin should be mentioned again.

Notice that in most endgames, a small material deficit is not enough to make the game decisive, and in most positions many moves are sufficient to maintain the draw. A strong engine would only have to choose one of them to avoid a loss.

It's not hard to imagine a very strong engine making what amounts to educated guesses holding a draw against a perfect player.

That's the problem with the rating system: DRAWS

My opponent can play a really good a game and force a draw in the endgame, I lose points just because he was a little lower? I could keep losing rating points even if I never lose. Same applies to engines.

llama
EndgameStudier wrote:

I could keep losing rating points even if I never lose. Same applies to engines.

Only if you draw against weaker and weaker opponents. Otherwise your rating will eventually stop falling.

You can also keep gaining rating points forever even if you never win... provided you draw against stronger and stronger opponents.