A 3000 could easily beat a 2000, but could a 4000 easily beat a 3000?

Sort:
Elroch
SmyslovFan wrote:

Theoretically, 4000 isn’t possible in chess because it is a draw.

That is not a valid inference, any more than someone watching Karpov and Kasparov play 48 games without resolution could infer than 3600 was impossible.

The highest theoretical rating is somewhere under 3600. The statistician Kenneth Regan postulated that 3571 is probably the highest an engine can achieve.*

He appears to be wrong on this one. This sort of rating is being achieved now (some estimates are that top engines on fast hardware are over 3600 on the scale extended from the human one, and ratings continue to progress by a few tens of points per year.  Computational power still provides higher quality for alphazero and Stockfish (for a perfect engine, doubling processor speed would provide insignificant additional strength, of course).

But a 3571 would destroy a 2571 as easily as a 2571 would destroy a 1571.

Not sure what "easily" means but, by definition, they should get a similarly overwhelming score.

——

* https://cse.buffalo.edu/~regan/papers/pdf/Reg12IPRs.pdf

I remember this paper now. With all due respect to him, Regan makes a rather absurd assumption at one point (it relates to Kramnik). Clearly he would come to different conclusions given 8 years more data.

SmyslovFan

@Elroch, you are probably aware that engine ratings are not the same as FIDE ratings. If engines weren’t forced to play a variety of iffy openings against each other they would already be drawing almost every game against each other.

 

Here’s a thought experiment: What score would Magnus Carlsen be likely to earn against the strongest engine in a 20 game match where he was awarded $10,000 for every half point?

EndgameEnthusiast2357
Adam-Herwis wrote:

I think chess is a drawn game with perfect play. However, 3000 isn't perfect, so a 4000 (if that exists) would destroy a 3000. That 4000 rated player reached that rating somehow. If you study the rating system for example, when you gain 120 elo, you become about twice as good. This is a 1000 rating difference, that means the skill gap is veeery big.

If the game is theoretically drawn, or even a win for white for that matter, would there then be a finite limit on ratings?

SmyslovFan
EndgameStudier wrote:
...

If the game is theoretically drawn, or even a win for white for that matter, would there then be a finite limit on ratings?

If chess is a theoretical win, there's no limit to the highest possible score, especially if one player always gets White. But even if the player with perfect knowledge only had white half the time, and played opponents without perfect knowledge, the highest possible score approaches infinity.

ponz111

If a game is theoretically a draw [as I am sure it is] I then there would be a finite limit on ratings as then all the very top players would eventually learn the drawing lines. 

If there was even one forced win from the opening position then also that would put a finite limit on rating as many players might learn the line? 

By the way you cannot really compare ratings say correspondence ratings and over-the-board ratings as the games are so different, 

LeiJChess
EndgameStudier wrote:
Adam-Herwis wrote:

I think chess is a drawn game with perfect play. However, 3000 isn't perfect, so a 4000 (if that exists) would destroy a 3000. That 4000 rated player reached that rating somehow. If you study the rating system for example, when you gain 120 elo, you become about twice as good. This is a 1000 rating difference, that means the skill gap is veeery big.

If the game is theoretically drawn, or even a win for white for that matter, would there then be a finite limit on ratings?

There is a finite limit, but no human has reached that level of perfection yet. 

LeiJChess
infestationPit wrote:

4000 is not enough, you need 4500 player to be able to beat a 3000 player as easily as a 3000 player beating a 2000 player. 

The rating cap is probably below that amount. Engines are roughly in the 3400-3500 range, and they play nearly perfect chess. Totally perfect chess with all the best moves is the best someone can do at the entire game. Thus, a 100% perfect player with 3500 rating can grind down a 3000 that makes blunders and mistakes. But expectedly, becoming a 3500 player is likely the hardest thing to even do in chess because have to be so consistent in your perfect performance at the game that any human that reaches 3500 already proved themselves invincible. 

OBIT

Even if all the drawing lines aren't completely known, we can make a reasonable estimate on the optimal rating, disregarding fluctuations.  Against an opponent that plays perfect chess, I'd guess a 2750 player can play a full game without getting into a lost position at least 10% of the time, thereby earning a draw.  Base of that percentage, the max rating should be at most 3100.  Now, this assumes rating inflation doesn't exist - the first human to break 3000 won't be attributable to perfect play so much as the five or so point shift in the rating curve we see every year.

 

 

 

ScootyMcScrooty

As I understand it, the Elo rating system works as a roughly normal distribution of ratings. If (idk real numbers but the math checks) the average is, say, 1000, and the deviation is 300, significantly fewer than 1% would make it past 1900. But as rating gets higher, draw rates tend to increase, which seems like it would 'soft lock' a rating ceiling. The closest real situation to this I know of would be Alpha Zero vs Stockfish (9), Alpha won 15.5% of games, lost 0.6%, and drew around 84% of the games. Stockfish 9 was rated around 3450. So, if a 4000 played a 3000, the 4000 would win more games, but they'd probably still draw much more often than not.  

llama
ScootyMcScrooty wrote:

As I understand it, the Elo rating system works as a roughly normal distribution of ratings. If (idk real numbers but the math checks) the average is, say, 1000, and the deviation is 300, significantly fewer than 1% would make it past 1900. But as rating gets higher, draw rates tend to increase, which seems like it would 'soft lock' a rating ceiling. The closest real situation to this I know of would be Alpha Zero vs Stockfish (9), Alpha won 15.5% of games, lost 0.6%, and drew around 84% of the games. Stockfish 9 was rated around 3450. So, if a 4000 played a 3000, the 4000 would win more games, but they'd probably still draw much more often than not.  

The way it works is wins + 1/2 draws / total games.

To be 1000 points higher will make that = greater than 99%.

So yes, a 4000 will easily beat a 3000.

AZ was only ~100 points stronger than SF (in spite of the ridiculous hype which made AZ seem really strong).

llama

The fact that this topic is 19 pages long is why I don't like to come here...

llama
OBIT wrote:

I'd guess a 2750 player can play a full game without getting into a lost position at least 10% of the time

This isn't mathematical, it's just silliness.

I remember an NM on these forums (@ozzie_c_cobblepot), almost 10 years ago, used some basic extrapolation (mathematically sound extrapolation) to predict what a perfect player's rating would be... but here you are saying "I'd guess"

You'd guess huh? Well isn't that special.

 

Elroch
SmyslovFan wrote:

@Elroch, you are probably aware that engine ratings are not the same as FIDE ratings. If engines weren’t forced to play a variety of iffy openings against each other they would already be drawing almost every game against each other.

 

Here’s a thought experiment: What score would Magnus Carlsen be likely to earn against the strongest engine in a 20 game match where he was awarded $10,000 for every half point?

I regard matches without forced openings as being standard chess. Seeing what happens with forced openings is an alteration to the rules of chess. Although in a sense "fair", it happens to be rather ugly and arbitrary in chess compared to checkers where the same approach was used with less arbitrariness.

It is easy to guess top engines are nearly perfect, but Stockfish looked pretty imperfect against Alphazero.

MARattigan
EndgameStudier wrote:

Doubt it. Stockfish can't even tell that these are illegal:

 

You may want to play the program from an illegal set up position, so the failing is really in the illegal positions that SF rejects.

Most of these are impossible to get through the Xboard interface so can't be blamed on SF, such as positions where both sides have the move, multiple pieces occupy the same square etc. But the Xboard interface can have interesting effects such as the following game snippet saved in Tarrasch (I've edited the PGN so that it appears as it appears when reloaded into Tarrasch).

SF lost on time. Can you work out why?

llama
Elroch wrote:

Stockfish looked pretty imperfect against Alphazero.

Sure, but imperfection is a low bar.

More to the point, Stockfish looked pathetic after they released an extremely limited set of data which put their product in the best light. People forget that AZ's "crushing" victory was a measly (IIRC) 62% even though SF had questionable hardware and settings -- good enough to be rated only 100 points higher. Overwhelmingly the games were drawn.

UppityEelChesskid

Theoretically, yes. When people say that most games at the top engine level are draws, they usually mean when engines of similar ratings play each other (e.g. a 3400 vs a 3410). But when a 3000 plays a 3400, the 3400 usually wins, meaning that a 4000 would easily beat a 3000

llama

It's just a very basic understanding of how the rating system works... you can't be rated 400 points higher if you don't win most of your games.

In fact the smallest number you can win out of 100 games is roughly 80 (and that's when you draw the other 20).

MARattigan
llama wrote:
Elroch wrote:

Stockfish looked pretty imperfect against Alphazero.

Sure, but imperfection is a low bar.

More to the point, Stockfish looked pathetic after they released an extremely limited set of data which put their product in the best light. People forget that AZ's "crushing" victory was a measly (IIRC) 62% even though SF had questionable hardware -- good enough to be rated only 100 points higher. Overwhelmingly the games were drawn.

I think the hardware has everything to do with it. 

LC0 uses the same approach as AZ and is usually regarded as strong, but it also usually runs with a GPU that gives it hundreds or thousands of extra processors compared with probably four used by SF.

I recently tried downloading an LC0 version that runs without video card to see if it played basic endgames any better than SF on the same hardware. The answer was that LC0 is totally useless with the same hardware.

llama

Depending on how you code something (even computer games) it can run better or worse on different architectures (e.g. Intel vs AMD)... having something written to utilize graphics card processing run on some i7 chip is of course going to be a disaster. It's not a fair comparison.

MARattigan
llama wrote:

Depending on how you code something (even computer games) it can run better or worse on different architectures... having something written to utilize graphics card processing run on some i7 chip is of course going to be a disaster. It's not a fair comparison.

But neither is comparing a program running with a graphics card to one running without.