Google Deep Mind Rating

Sort:
sammy_boi
btickler wrote:

Ironically, once machine learning surpasses the chess world's, GMs will quickly begin to realize that they cannot really learn much from engines anymore, because while those engines will play and understand chess far better than a human ever has

That's a fairly old revelation, even though class players often still don't get it.

Even Aronian recently said of AlphaZero:

Aronian: "Currently I am analyzing with a program that is five years old! So I don't care so much, it's more about adopting the programs to suit your playing style rather than have the best computer program. At the end of the day the position you get, you're going to play, not the computer so it has to suit human's taste."
(emphasis mine)

DiogenesDue
sammy_boi wrote:
btickler wrote:

Ironically, once machine learning surpasses the chess world's, GMs will quickly begin to realize that they cannot really learn much from engines anymore, because while those engines will play and understand chess far better than a human ever has

That's a fairly old revelation, even though class players often still don't get it.

Even Aronian recently said of AlphaZero:

Aronian: "Currently I am analyzing with a program that is five years old! So I don't care so much, it's more about adopting the programs to suit your playing style rather than have the best computer program. At the end of the day the position you get, you're going to play, not the computer so it has to suit human's taste."
(emphasis mine)

Except that I am referring to ALL the GMs, and every other chessplayer on planet earth.  Maybe I should have said "humanity".

I don't really give a rat's ass about Aronian analyzing games.  It's meaningless by comparison, as all human chess games are, ultimately.

sammy_boi

Just sayin.

I had a game where during the postmortem this guy's coach is berating him for playing a line. And the conversation went something like "but it's theory, why can't I play it?" and the coach is all "so where does you play come from in this position, can you explain it? Because I can't, and I'm a GM. If you're going to play lines you don't understand then play things like the petroff and London." 

tongue.png

DiogenesDue
sammy_boi wrote:

Just sayin.

I had a game where during the postmortem this guy's coach is berating him for playing a line. And the conversation went something like "but it's theory, why can't I play it?" and the coach is all "so where does you play come from in this position, can you explain it? Because I can't, and I'm a GM. If you're going to play lines you don't understand then play the petroff or something" 

 

Well, yeah...I agree.  If you are a human being, there is no point whatsoever in playing moves you cannot understand.  If you do, you will be lost the second your opponent deviates from theory.  Magnus trounces other GMs this way all the time with moves that are considered "suboptimal".  If you are lazy and memorize lines without understanding them, prepare to lose.

soe2718
[COMMENT DELETED]
soe2718
Ekrabin wrote:

After reading how the Google program crushed Stockfish in 100 games without losing once, I wonder what its rating would be?!

Well not including K-scores, 

Elo is based on the following.

Sum of opponent's ratings lets say T.
Number of games N.
Difference between total wins and total losses lets say d.

so, 
Rating = [T + (400d)]/N

Assuming they played the strongest version of stockfish rated at 3422.

The first set of 100 games sets N = 100.
d= 28 
T is the rating of stockfish times the number of games played or N*3422

This simplifies to:
rating = [3422*100 + 400*28]/100
 = 3422 + (.28*400) = 3534

If we were to include the other 1200 games played
This now makes N = 1300, 
d = 294
or 
rating = 3422 + [400*(294/1300)] = 3512

This mean's alpha zero's strength from what we know given my limited knowledge of the ELO system is between 3512 to 3534.

Heres where I got the formula: https://en.wikipedia.org/wiki/Elo_rating_system
Here is where I got Stockfish's rating: http://www.computerchess.org.uk/ccrl/4040/rating_list_all.html

Again this may not be accurate as I have not used K-scores, and I am making the ASSUMPTION, google used the strongest version of stockfish with the hardware necessary to get that performance.

JBabkes

Thanks soe2718 for the detailed post happy.png

SmyslovFan

Soe,

 

According to several sources, they did not use the strongest version of Stockfish by any stretch of the imagination.

 

I am sure Alpha Zero is stronger than Stockfish, but it probably didn't break 3500. I'd love to see the other 90 games they played tho.

Elroch

Bear in mind the CCRL rating for Stockfish is based on a 2-core Athlon machine, rather than the 32 core Intel machine that appears to have been used in the AlphaZero match. You can be sure from the information available that Stockfish is much stronger on the faster hardware. 

This point about CPUs and ratings. Engine ratings, like human ones, are based on continual reference to a changing population, with early ratings being roughly comparable to human ones. If the CPU is changed, all the engines play stronger, which would deflate the rating system, so one solution is to freeze the CPU used even though it becomes out of date. This is what CCRL has done. The assumption is that this is a fair comparison because different engines will improve at roughly similar rates with time or CPU speed (which are exactly equivalent).

As a result, even though a Stockfish fork is Elo 3422 on the CCRL list, the engine running on more modern hardware that is about 50 times quicker would have a much higher rating. How much, I am not sure. Engines are even stronger than they appear to be on paper.

[On another point, note that AlphaZero's performance with opening chosen for both sides was less than 20 points weaker than when both computers had complete freedom. And when you are comparing how good two engines are at playing chess, the issue of how good the book its programmers have given it is very much a secondary concern: the competition is about playing chess moves, not designing an opening book].

 

soe2718

Great feedback from SmyslovFan and Elroch happy.png. This is precisely why I mentioned my methods, as I know that my assumptions could be different from what was actually being tested.

Elroch I really appreciate your post, as I did not know that the hardware performance is kept constant! Thank you I learned something new happy.png.thumbup.png What is the specs what which CCRL runs stockfish? I am curious because I was wondering if I could get stockfish to the same level of performance on my computer. I have a two-core i5 which has 6GB (5.9 available). What are the specs of the 2-core Athlon machine?

Elroch

 Your PC is probably at least 1.5 times as fast as the CCRL standard machine according to benchmarks.

DiogenesDue

...which is why TCEC is the better measure of top engine by far wink.png.

SmyslovFan
btickler wrote:

...which is why TCEC is the better measure of top engine by far .

Nope, it's a great measure of traditional software, but not of an engine that only works on specialized hardware.

DiogenesDue

TCEC is not using "specialized" hardware, just more expensive hardware anyone can get access to.  CCRL is like eating Xmas dinner on plastic plates with plastic cutlery.  The reason CCRL is run the way it is is simply because the people running have less resources, producing a less accurate result.  If you ran engine championships on smartphones, it would be even worse.

If the goal is to get best play from an engine, and I think that it is...then TCEC is superior.  Period.  For the same reason we don't consider the world's highest rapid/blitz ratings to be the measure of best chess player on the planet...and honestly, if the world could live through a WCC match with games lasting weeks each, we'd have a better standard of play there, also.  But a game a day is a compromise that we live with for the prospect of getting increased mainstream viewership.  Some people want rapid to become the standard for that same reason, but nobody is arguing in that case that the chess play is not better with more time (and for an engine, increasing processing power provides more "time").

Specialized hardware, in developer terms, would really mean processors with a different instruction set and completely different hardware, ala AlphaZero, or perhaps a PC equipped with some additional $80K FPU custom built card or something.

The TCEC hardware is "street legal".  If you want to put your CCRL Honda Accords up against a TCEC Dodge Viper, the Viper is obviously the better measure of "how fast can a car go?".  Note that I didn't say Lamborghini or something, because the TCEC hardware would not even count as being that top end.  That would be like a $25K server rig, not a $5K one.  But it's not specialized like a Formula One car with completely different engine specs, completely different chassis, completely different tires, and completely different fuel.  That's AlphaZero.

TCEC season 10 specs, which are rather low end for a corporate server, rather high end for a gaming rig:

Season 10 server
CPUs: 44 Cores -> 2 x Intel Xeon E5 2699 v4 @ 2.8 GHz...$4000 (high estimate)
Motherboard: Supermicro X10DRL-i...$290
RAM: 64 GB DDR4 ECC...$700
SSD: Crucial CT250M500 240 GB...$200
Chassis: Supermicro...$110
OS: Windows Server 2012 R2...$700

JBabkes

Thanks btickler for the info.

Elroch

Roughly speaking, CCRL favours computers that can evaluate more nodes per second and TCEC favours computers that have better evaluation functions, so can benefit more from additional time by a deeper, more selective search. However, near the top of the rankings all of the engines are probably quite close by both measures, so the difference is not not going to be huge.

Here is the top 5 from CCRL:

  • asmFish 051117
  • Houdini 6
  • Komodo 11.2
  • Deep Shredder 13
  • Fire 6.1

and from TCEC

  • Komodo 1959.00
  • Stockfish  051117
  • Houdini 6.03 
  • Fire 6.2 
  • Chiron 251017

I suspect the Fishes in the two are actually the same engine, rather than merely variants, based on the number.

You might see it as a slowplay rating versus a blitz rating. Both are valid, just different, and both are strongly correlated for the strongest players.

DiogenesDue

Well, yes, one would have to assume that asmFish is just an assembler-optimized version of the normal Stockfish build.

Frankly, all engine development should probably be done in assembly language from the ground up, which would be faster than optimizing compiled code, but there's just precious few developers nowadays that learned assembly first and higher level languages second...because colleges teach these courses backwards.  Every developer should have to learn some form of assembly language before learning an abstracted compiled language.  

By the same token, kids should be given generic Legos to build with before they ever get prescribed kits with instruction booklets for building a Frozen castle.  Because building a Frozen castle from a booklet doesn't teach you crap about the much broader possibilities of Lego construction.

DiogenesDue

By the way, from one of the Stockfish devs...an open letter to the DeepMind team:

"Dear Sirs,

Please let me congratulate you on your amazing achievement in developing AlphaZero chess!   As someone who completed a thesis in neural networks back in the 90's,  I could not be more amazed at how far you have been able to advance the field.

I can't speak for the entire Stockfish team so I simply speak as one of its open source contributors.  If you read other posts on this forum or talkchess.com however you may find that what I'm about to point out may mirror the sentiment of others in the computer chess community as well.

AlphaZero won the 100 game match against Stockfish very  impressively by a total score of 28 wins and 72 draws and 0 loses.  This translates to an Elo difference of 100.  However the details of the match described in your paper show that this match might have been much closer and more interesting  had it not been for some IMO rather unfair conditions.  These might not be immediately obvious even to those using chess engines on a regular basis.

1) In the match version 8 of Stockfish was used which is now over a year old.  The latest version of Stockfish is over 40 Elo stronger in fast self play.
http://tests.stockfishchess.org/tests/view/5a23e7c10ebc590ccbb8b6d8
When consulted the Stockfish team always enters the latest version into serious competition such as TCEC.

2) The 1GB amount of memory used for the hash table on a 64 core machine with 1 minute per move is sorely inadequate.  Stockfish displays the % of hash used so anyone can see how quickly it fills up.   An reasonable amount of memory would likely have been around 16 times more at 16GB.  The reason this is especially critical with many threads is because Stockfish uses the hash as the main mechanism through which all threads communicate(aka Lazy SMP).  It is almost certain that this resulted in another significant Elo reduction in Stockfish.

3) Much effort has been put into making Stockfish understand which positions are critical and which are not.  Based on this Stockfish manages its clock very carefully spending significantly more time on some positions during a game and very little on others.   Disabling this feature and forcing Stockfish to use its time based on your same 1 minute for every move time control results in yet another large Elo reduction.

Since the Stockfish team wasn't contacted prior to the match I believe the issues outlined above were simply a result of unfamiliarity with the Stockfish engine.   With the above issues corrected  the 100 Elo gap should change quite significantly.  I believe you are interested in a fair match more than winning and it is therefore my hope that a second proper rematch can be played for the benefit of both scientific research as well as the chess community.  I wish to thank you for the tremendous contribution you have made to computer chess with a completely novel approach and hope that Stockfish has been a useful competitor for your testing.  Please don't hesitate to contact myself or the Stockfish team in the future.  We are your fans.

Sincerely 
Michael Stembera(Fisherman)"

JBabkes

wink.png

SmyslovFan

Thank you for sharing that letter, @btickler.  It goes explicitly against much of what Elroch has been saying, and corroborates what other Stockfish experts said immediately after the match.