Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
Elroch
SmyslovFan wrote:
Elroch wrote:

I have seen no even half convincing argument that perfect chess has an Elo of 3600 - please do provide a link. We have just seen the first of a new breed leap to 130 points stronger in 1 step, and at a mere 80,000 nodes per second, this is not the last word.

Ask, and it shall be given...

A simple linear fit then yields the rule to produce the Elo rating for any (s, c), which we call an “Intrinsic Performance Rating” (IPR) when the (s, c) are obtained by analyzing the games of a particular event and player(s).

IPR = 3571 15413 · AEe. (6)

This expresses, incidentally, that at least from the vantage of RYBKA 3 run to reported depth 13, perfect play has a rating under 3600. This is reasonable when one considers that if a 2800 player such as Vladimir Kramnik is able to draw one game in fifty, the opponent can never have a higher rating than that."

 

Source: Kenneth Regan:  

https://www.cse.buffalo.edu/~regan/papers/pdf/RMH11b.pdf

I do remember that now: it seems like a very long time ago. Your interpretation is unreliable. Rybka 3 would be utterly crushed by AlphaZero for sure. It would do badly against any of today's top engines, at roughly 50 points a year advance. Its notion of accuracy would be very wrong for Houdini, Komodo or Stockfish, and worse for AlphaZero. Indeed, pretty much anything it would see as an inaccuracy would be its own mistake! The key is in the words "from the vantage of Rybka". This does not extend to finding fault with a player that is consistently more reliable than it on every move.

It would be interesting to see a new analysis using Houdini on a large server, like the Rybka 3 one was a follow-up to one with Crafty (mainly focusing on human play).

SmyslovFan

I trust the IM on this one.

Elroch

His conclusion was qualified with "from the vantage of Rybka 3". To Rybka 3, 3600 is a player that beats it almost all the time. Its ability to spot an error in the play of such a player is almost non-existent.

This issue was not so clear with computers analysing humans, because computers are very good at complex tactics, and many human errors can be viewed as complex tactics. The types of errors most computers make are perhaps more positional to a human eye, although that doesn't mean we could avoid them! I would be hard-pressed to say what sort of errors AlphaZero makes, because no losses have been published. It would be fascinating to see how it went wrong in two King's Indian games, although I would be rooting for it. happy.png

Elroch
SmyslovFan wrote:

Notice that I gave the complete information in context. I am sure someone will come along and try to parse the first part. The key is "if a 2800 player... is able to draw one game in fifty, the opponent can never have a higher rating than [3600]." ~Regan.

 

That is how a chess player and statistician deals with the issue. Computer people see constant improvement and think it is infinite. It isn't.

Believe me, my knowledge of statistics is more than adequate for this topic! The argument is intuitively attractive, but fatally flawed. Where are the draws by humans against the latest engines running on fast hardware? They play them with knight odds instead. The shocking truth is that a 1% score is getting out of reach. Of course, this is an awful pairing to estimate a rating as the information content is so small. Each game by a 3400 engine against equal opposition is more informative than dozens of wins against a 2800.

SmyslovFan

I have a few minutes to elaborate. 

The computers are rated almost entirely based on their performance against other engines. From very early on, it was known that computers that played only against other computers tended to have a higher rating because they could consistently punish the same basic programming errors. Humans don't often make the same mistakes repeatedly the way engines do.

Humans such as Kasparov and Kramnik and Nakamura have almost always played the computers with the goal of beating them. If they adjusted their goal to getting draws, I am absolutely certain that they could still draw at least one in 25 as white, even if that computer played perfect chess!

I am not alone in that belief. I don't consider the reasoning to be "unreliable". I am fairly certain that Nakamura himself, if he were to weigh in on the conversation, would agree with Regan and myself.

ponz111
Disposer_Of_Trash69 wrote:

Regardless of the "relative to AlphaZero" part, is Magnus not already considered a patzer?

No

SmyslovFan

I really should quote you from now on. You keep changing your responses.

Elroch
SmyslovFan wrote:

I have a few minutes to elaborate. 

The computers are rated almost entirely based on their performance against other engines. From very early on, it was known that computers that played only against other computers tended to have a higher rating because they could consistently punish the same basic programming errors. Humans don't often make the same mistakes repeatedly the way engines do.

Humans such as Kasparov and Kramnik and Nakamura have almost always played the computers with the goal of beating them. If they adjusted their goal to getting draws, I am absolutely certain that they could still draw at least one in 25 as white, even if that computer played perfect chess!

I am not alone in that belief. I don't consider the reasoning to be "unreliable". I am fairly certain that Nakamura himself, if he were to weigh in on the conversation, would agree with Regan and myself.

Any conclusion about perfect chess (rather than chess that dominates a specific engine) is unreliable for the two reasons I gave.

  1. An engine's ability to identify errors in an opponent's play heads rapidly to zero for much stronger opponents regardless of where perfect chess lies. All that changes as we approach perfection is that much stronger opponents can no longer exist! Their non-existence can't be deduced from the understanding (anthropomorphically) of relatively weak players alone.
  2. Just because top human players used to get occasional draws against earlier programs does not mean they continue to get as many as the programs add hundreds of Elo points.

You would expect engine performance to plateau well before the time when the algorithms are within a whisker of perfect and computing power is so sufficient that increasing it 1000-fold would not give a significant advantage due to diminishing returns. That us not what we have seen. We have seen engine algorithms improve significantly and now a new approach provide a leap in rating. We will see substantial advances in the next decade if anyone invests the resources. An AI with a wider, deeper net, much faster hardware and far more training time would convincingly beat AlphaZero.

SmyslovFan

Once again, you speak about computers without regard to chess. 

A player doesn't have to play perfectly to draw a game of chess, even against a perfect opponent.

 

Added: The people at AlphaZero didn't see fit to publish any of the draws. I strongly suspect there were quite a few boring games that petered out into lifeless draws, and perhaps even some where Stockfish had a significant edge out of the opening.

Elroch

The reason I talk about engines regarding the strongest chess players is because humans don't play above 2900 Elo. What we are discussing here is the game theory of a finite game that is way too large to fully analyse.

Firstly, a player that can draw against any player is perfect. This has been explicitly achieved for checkers. If a player is imperfect and deterministic, another player can beat them all the time (with some colour). If a player is imperfect and stochastic, some other player has a strategy that will achieve a positive score. It is certainly possible in principle for a player to play imperfectly in a random way that makes it impossible to beat them all the time (an example can be constructed by starting with a perfect player and simply adding a random blunder that it plays some of the time in one specific position.

Secondly, you should note that your guesses about AlphaZero-Stockfish indicate you do not believe AlphaZero is very close to perfect. Note also that the notion of an advantageous position is solely for imperfect chess players. A perfect chess player knows whether a position is a win, draw or loss with perfect play. Of course, AlphaZero and Stockfish are not close to such a player.

 

Elroch

Now let's get back to why Regan's analysis cannot tell us the rating of a perfect chess player.

This analysis used an engine rated in the low 3000s to analyse the games of human players up to around 2800 and find their errors, as judged by its own evaluation routine. Regan found a linear model that approximated the relationship between mean error rate (AE) for populations of players in particular Elo ranges and the mean Elo value of the range.

What can't we infer from this?

  1. That there is a deterministic relationship between AE and Elo. In fact there isn't. The style matters too. A Petrosian has much smaller AE than a Tal even if both have identical Elo. This is noise in the linear model of individual player AE against Elo.
  2. That the relationship necessarily extends to populations of computer players. Or indeed other populations of human players. A population that is more tactically strong (while positionally weaker) or plays or collectively decided to play in a more positional or more tactical style would have a different statistical relationsip.
  3. That the relationship is actually linear, rather than locally approximately linear. The amount of non-linearity could be substantial even in the range of the data, and enormous in a large extrapolation.
  4. That the evaluations of Rybka would tell us anything about players who were not only much stronger than the players in the population, but also not humans (so prone to different sorts of errors) and much stronger than Rybka itself.

Note also that the whole notion is based on the heuristic of pseudo-material. Perfect chess is about values of 0, 1/2 or 1, and a theoretically better approach is the expected score (loosely, probability of winning) of AlphaZero.

Any statistician will explain that if you take a very noisy linear model (note carefully that the relevant noise is that for individual players, not for the averages of all players in a rating band, and this noise is very large) and extrapolate it way, way out of range while using a distinctly different population of players (theoretically abstract chess players, in practice engines and now AIs) that you can't rely on the conclusions.

I am sure you can comprehend enough of those points to see that absolute conclusions about the Elo of a perfect chess player are not merited.

Lyudmil_Tsvetkov
Godeka wrote:

@Lyudmil_Tsvetkov:
Too bad that Talkchess is your only source and 'search in Talkchess' is your only argument.

According to Chess Programming Wiki the engine RomiChess has no NN. And no, the number of terms has nothing to do with a NN. You can create a simple NN with two input neurons and one output neuron, and it is still a NN.

 

> The engine just goes on adding new evaluation terms.

It does not, the number of weights is fix. Only their values are modified during training. And no one knows which criteria and relations and knowledge is expressed by the weights – or result to the weights.

You cannot add criteria. Supervised learning is possible, as it was done in AlphaGo Lee.

 

> Computer chess history consistently shows that better
> engines always had improved evaluation, invovling also
> a larger number of parameters.

More parameters for more specific situations for finer evaluation. Makes sense, but it does not mean that more parameters are better. A well balanced and tested evaluation function can be stronger than a bad one with more parameters. The quality of the evaluation result counts, not the complexity of the evaluation function.

I am telling you what statistics show, you might be theorising as much as you want...

Lyudmil_Tsvetkov
Godeka wrote:
Lyudmil_Tsvetkov hat geschrieben:
SmyslovFan wrote:

A perfect chess playing machine won't break 3600. It's not that you don't understand computers, it's that you don't understand chess, or how the Elo system works. 

Chess is a draw, and by a wide margin. Even AlphaZero playing a handicapped Stockfish didn't reach a 3600 performance level.

Alpha Zero is sooo WEAK: does not fianchetto its king side bishop with Bg2, often plays 1.d4, etc.

So, soo weak.

 

Trolling can be funny, but simply repeating is boring.

What do you mean trolling?

Alpha is as much stronger than SF 8 than SF 8 is than SF 7, more or less.

Would you say SF 8 plays perfect chess.

I have been there, I know what I am talking about.

Lyudmil_Tsvetkov
SmyslovFan wrote:
Elroch wrote:
SmyslovFan wrote:

A perfect chess playing machine won't break 3600. It's not that you don't understand computers, it's that you don't understand chess, or how the Elo system works. 

Chess is a draw, and by a wide margin. Even AlphaZero playing a handicapped Stockfish didn't reach a 3600 performance level.

To be fair, you could look at the results of Karpov - Kasparov, see mostly draws and conclude 3000 could not be exceeded. This would be wrong by a large margin. We do not yet have a clear estimate of the Elo of perfect play.  We know empirically that it gets much harder to improve by similar amounts.

I am sure somebody somewhere has said that 3000 was a fixed limit, but I don't remember that. I do remember some people saying humans would never break 3000 in classical chess, and that is a reasonable guess.

Kenneth Regan, an IM and a professional statistician, has estimated the highest possible rating to be slightly below 3600. Others have generally agreed that 3600 does look like the upper limit from a theoretical perspective. 

 

Again, this isn't about computers, it's about chess itself and the way Elo is calculated.

Kenneth Regan predicted Hillary Clinton will win the presidency race, when I already knew Trump will win...

Lyudmil_Tsvetkov
SmyslovFan wrote:

I trust the IM on this one.

The lower boundary for perfect play is at least 6000 elos, I know top engine play by heart, and they are making suboptimal moves all too often, 2 out of 3 or so.

Some facts:

- SF still thinks on 1.e4 1...e6 might be best

- SF is still not certain the Sicilian is best for black

- SF does not know to fianchetto its king side bishop, either with black or white

- etc., etc.

 

More or less, it plays like an amateur.

It does not make shallow tactical mistakes as humans do, and that makes it strong.

 

Alpha's games from the last match with SF are riddled with mistakes, I am just lazy to check them now.

Lyudmil_Tsvetkov
SmyslovFan wrote:

Once again, you speak about computers without regard to chess. 

A player doesn't have to play perfectly to draw a game of chess, even against a perfect opponent.

 

Added: The people at AlphaZero didn't see fit to publish any of the draws. I strongly suspect there were quite a few boring games that petered out into lifeless draws, and perhaps even some where Stockfish had a significant edge out of the opening.

Top engine draws, in sharp distinction to human draws, are never boring...

OK, I am out of this discussion, seemingly people are just citing some authorities who are using mathematical methods to make predictions about chess without the necessary knowledge base.

In order to do some predictions, you should first try to investigate chess deeper.

 

Elroch
Lyudmil_Tsvetkov wrote:
SmyslovFan wrote:

I trust the IM on this one.

The lower boundary for perfect play is at least 6000 elos, I know top engine play by heart, and they are making suboptimal moves all too often, 2 out of 3 or so.

Some facts:

- SF still thinks on 1.e4 1...e6 might be best

- SF is still not certain the Sicilian is best for black

- SF does not know to fianchetto its king side bishop, either with black or white

- etc., etc.

 

More or less, it plays like an amateur.

It does not make shallow tactical mistakes as humans do, and that makes it strong.

 

Alpha's games from the last match with SF are riddled with mistakes, I am just lazy to check them now.

All that is true because you are 1400 points stronger than AlphaZero.

(Actually I may have something wrong there).

prusswan

Just remember that Chess is a trivial matter since most of the work was already tested in Go (a much more difficult game for engines and humans). They wrote a conventional Go engine that thoroughly defeated top human players, then with that they have something to test against their pure self-learning with zero human knowledge AI, to prove that it can work and surpass their earlier creation. (And they no longer need to involve human players since they are no match for the top conventional engine, much less something that is even stronger). Without understanding the earlier developments in Go and the predecessors of AlphaZero, it is difficult for some chess players to grasp the magnitude of the development and the implication on chess.

 

They have demonstrated their work in Go can be generalized to other similiar games, and will eventually do so to other fields in future. It is not their aim to prove they will create the strongest chess engine (although that is a possible side-effect), but the approach that was used to derive the self-learning AI which is applicable to many fields.

Godeka

AlphaGo Fan had about 2900 ELO and the Go world was stunned about such a strong engine. But it was still some stones weaker than the strongest humans. Some month later AlphaGo Lee surprised again with 3700 ELO. The next months elapsed, AlphaGo Master was released and had about 4900 ELO. What an unbelievable strong engine! No one thought there is such a big margin to improve for professional players. And than AlphaGo Zero appeared, 3 stones stronger with nearly 5200 ELO. But it lost 60 of 100 games against AlphaZero which learned only 34 hours.

Chess is more tactical and NN are less efficient, but still I bet there is much room for improvement.


@Lyudmil_Tsvetkov
> I am telling you what statistics show, you might
> be theorising as much as you want...
and
> I have been there, I know what I am talking about.

Statistics say nothing about what a NN is or which strength AlphaGo has on a single CPU core. And for me it seems that you have no clue what a NN is.

HobbyPIayer
SmyslovFan wrote:

The key is "if a 2800 player... is able to draw one game in fifty, the opponent can never have a higher rating than [3600]." ~Regan.

 

Yes, but Carlsen (and company) are closing in on 2900 these days. Which means, relatively speaking, the Elo ceiling for chess is approaching 3700. And once humans reach 3000 Elo, the ceiling will rise to 3800. Et cetera . . .

Elo isn't a concrete measurement of chess performance—it's a comparative one.

Consider Stockfish, for example. Its Elo is around 3400. If AlphaZero were to beat it 49 times and then draw it once (hypothetically speaking, with SF playing at optimal strength), then AlphaZero's Elo would be around 4200—far higher than the "3600 is perfect chess" ceiling that many people believe in.