Google Deep Mind Rating

Sort:
Elroch

It is a reasonable basis for the hypothesis that Stockfish would have done better with a larger hash table. Whether this is actually so, or how much better it would have done, still requires verification: there is no solid evidence for this.

I can perfectly understand that the Stockfish team would like to fight for the reputation of their engine, but they hardly need to: it remains the best engine in the world with certain sets of rules, and very close at others.

Most top players seem happy that AlphaZero won by playing genuinely better positional chess than it, and this superiority is particularly insensitive to the speed of Stockfish's hardware and configuration. If the match was about complicated tactics, looking at 875 times as many nodes might have left Stockfish the winner!

Note also that all arguments about Stockfish being limited by speed and efficiency ignore the fact that the setup was fast by the standard of almost all engine competitions.

As I pointed out somewhere in a discussion, all the Stockfish developers need to do is exhibit a version and configuration of Stockfish that does better than 64/100 against the version that was beaten by AlphaZero, with the same rules. (This is actually a 103 point Elo difference, to be precise, so the claimed 40 point improvement is a good start).

NoHaxJustLuck

I think it would be hard to judge the rating, because some flaw in the rating system... a rating of 4000 and 4200 probably isnt gonna be different, because nobody could ever go that high.

Elroch

It doesn't matter if no human could: the Elo rating system applies just as well (actually rather better, because of consistency) to computers.

sammy_boi
NoHaxJustLuck wrote:

a rating of 4000 and 4200 probably isnt gonna be different

It's 200 rating points different, which corresponds to scoring roughly 3 out of 4 in a match. A pretty big difference.

sammy_boi
Elroch wrote:

 Most top players seem happy that AlphaZero won by playing genuinely better positional chess than it, so it's the judgement of players who are 1000 points lower, and on a topic unrelated to how SF could have played better vs at least one developer's highly specialized software SF knowledge... and this superiority is particularly insensitive to the speed of Stockfish's hardware and configuration. That's ridiculous.

 

Elroch

While it is true that the opinions of very strong human players on the play are not entirely reliable, they are the more relevant opinions. Nakamura's included!

No, it isn't "ridiculous". There is solid evidence that Stockfish performance was very insensitive to computing power above the large resource it had in the match  (i,e, 8 billion nodes per move would be only a whisker better than 4 billion and there are diminishing returns with doubling speed). This is understandable, as an additional ply on the search depth would be expected to be gradually less likely to be crucial.

There is also solid evidence Stockfish was running efficiently on the hardware (70,000,000 nodes per second, which compares well per core to every machine I have seen).

There is no substantial evidence to support the guess that huge hash tables are better, never mind very beneficial (simple experiments don't support this). And remember, Stockfish can't use the hardware AlphaZero used, which does simple matrix calculations fast, as needed by large neural networks. (Much larger parallelism is the reason human brains can compete, despite vastly lower "clock speed"!)

DiogenesDue

Except that in this individual case, the Stockfish developer is telling us flat out that Stockfish does better with larger hash tables for thread management, and also pointed out that Stockfish displays when it is maxing out on hash table size, so it's pretty clear when the hash tables are set too low.

It not at all unlikely that the DeepMind team actually started with larger hash tables, and then intentionally tuned downward to achieve their desired results.  Which is why a "private" match is bogus.  Someone representing Stockfish should have been there for setup.  Probably the same reason they seemingly set the 1 minute/move time control using the external command line to cut Stockfish off mid-calculation rather than "informing" Stockfish of the time control it had to work with via the menu and letting it prune accordingly (you can find that info elsewhere in the thread I got this open letter from).

None of this would have mattered if they had done a private match and kept the results private, or had a public match with public results...but they had their cake and ate it, too, knowing that an open source effort like Stockfish would not have any resources to dispute the results in the press.

"Press Release:  After appropriating Oracle's Americas' Cup yacht and running private races in an unspecified body of water, DeepMind has announced that their boat went undefeated...Larry Ellison disputed this claim, but since nobody has access to DeepMind's boat, they can't do jack about it...".

Elroch

I accept that the Stockfish developer believes this, but simple experiments don't provide much support for this idea, and I have been entirely unable to find a decent experiment that confirmed a sizeable increase in performance that is relevant.

It is almost certain that the Deepmind team did NOT start with large hash tables, since they set several of the computational parameters (hash table, minibatch size, and sample size) before generating AIs for all three classic games. The 1Gb hash table choice was used for both Stockfish and Elmo.

If they had not got good enough results, they would very likely have regrouped and produced a stronger AI (I would not exclude the possibility that a weaker AI was produced first of all, but I have no evidence of this).

RodneyDiamond1989

you guys are seriousely misunderstanding alpha zero its not going to reach a rating of 3700 let alone 4000 or 10000 like some crazy's are saying!... AS STATED BY THE Alpha Zero team it actually trained for more like 20 hours and that the longer it trains the slower it improves..... After it reached the level it was at they said letting it continue would be almost pointless as the improvements at that point are very very slow! The TPU's which are what they ran alpha zero called tensor processors ARE ALSO MUCH more powerful then the machine that stockfish 8 ran on without its opening book! im sorry but if you took stockfish 9 today with its opening book it would beat alpha zero even on its TPU's IF you gave stockfish his opening book.. also interesting is if you just give both engines a set time to make all their moves instead of 60 seconds locked per move stockfish would do EVEN better

 

RodneyDiamond1989

because time management is a part of chess afterall

 

drmrboss

Yes I agreed. At first I was impressed by those deep mind team. Later , there were significant flaws in the unfair match. Frankly speaking, they are doing a 100 race between a fish and a snail.And the snail won 100 m race." cos the fish has to run in very unfavourable condition". Unfavourable conditions for stockfish were 1. 1GB memory,2. 1 min per move 3. Older version.All cost more than 100 elo. I bet my money on current SF with full equipment ( opening book, EGTB, any standard time from 25min/ game to 2hour/40 moves )

prusswan

Stockfish was running on hardware better than TCEC. If anything it is just not able to scale well using any current hardware available. AlphaZero was able to prove many things with a fraction of Google's compute power and exposed the handicaps that prevented middlegame engines from advancing further - their reliance on books and inability to scale. SF will still miss any good move that is beyond its search depth, at long time controls that will benefit AlphaZero even more.

drmrboss

Do you know how alpha beta , lazy smp search work? Have you see any programmer' opinion or do you simply believe on deepmind google ads. " like get 6 packs in 6 weeks, 10, 000 per months by online job etc". SF strength is heavily dependent on hash size. All search are stored in memory. Regardless of speed if there is a heavy bottle neck in one area of operation, all results restricts on that. To clearly explain you as an example so that you know how the computer works. e.g,even though there is 4 inches water supply pipe to your home, if there is a bottle neck pipe 1/2 inch supply in one area, your net benefits gets 1/2 inch pipe supply. But the ads company will tell you they are delivering you with 4 inch diameter water supply. 

Elroch
drmrboss wrote:

SF strength is heavily dependent on hash size.

Show us the data on the dependence of Stockfish's Elo performance on hash table size. Guessing does not suffice.

MacJT

Couldnt this whole issue be solved by replaying the games and verifying if Stockfish indeed plays the move it played earlier? If it does even with 5 or 10 mins then the AZ team has a point and it is indeed a better program.

If i know Google, they already tested the strongest Stockfish engine with all regular controls with much better hardware and AZ most likely defeated it before releasing the results, the games they released were just for taste.

I read somewhere that once a neural network is built , it doesnt need a very strong computer for execution.

Elroch

A basic deep neural network has a set of inputs, a set of layers of neurons with (typically millions of) connections and at run time feeds forward each set of data to be processed once through the network. But alphazero is not like that. It uses the network to basically play a set of random games from the position of interest, guided by the network at every decision. Checking the paper, it evaluated about 80,000 positions for every move it played, which means 80,000 feed forward steps.

But this only required 4 second generation TPUs (admittedly, that's a pretty powerful parallel processor), while training the network used 64 second generation TPUs, so playing is considerably less demanding, as you said.

ackubota

AlphaZero will sacrifice pieces for position. I have never seen any chess computer do that. It is different, and I think it is why it is beating the other chess computers. Because the computers are not used to that kind of playing.

DiogenesDue
Elroch wrote:
drmrboss wrote:

SF strength is heavily dependent on hash size.

Show us the data on the dependence of Stockfish's Elo performance on hash table size. Guessing does not suffice.

He doesn't have to guess, because one of the main Stockfish developers said so.  Regardless, the burden of proof is not on Stockfish, which plays in public matches with certified results.  The burden of proof rests with the DeepMind team, who released private and unverifiable test results as if it were an official outcome, for publicity.

Why call for Stockfish's team to provide *even more* transparency then we have from an *open source* chess engine, and then give a complete pass to A0 to be transparent about anything?  You could always dig through the code yourself to determine how much Stockfish relies on the hash table wink.png...  

You are giving A0 too much credence because it represents an advance in AI capabilities that pleases you.  It pleases me too, to see chess engines finally going in this direction, with bootstrapped play that eliminates built-in human biases, but that doesn't mean I can't see that what they have done is little underhanded and disingenuous.  They claimed victory for a match they closely controlled, then this year they sponsored the London Classic so they could walk all the chess press around their hallowed halls one floor downstairs and get a bunch more publicity for A0 still without playing a single official game or risking a negative outcome in any way.  It's a brilliant way to handle it...if you don't care about ethics.

Elroch

No, one of the Stockfish developers asserting the hash allocation had a big effect was not enough. Such an assertion needed to be demonstrated.

Since all that early discussion, there has been a second match between AlphaZero and Stockfish, with all of the possible disadvantages of Stockfish in the earlier match removed (time management, opening book, hash table, etc.). The result was that AlphaZero won almost as convincingly as in the first match, indicating that the total effect of the issues was not huge.

Preprint of 32 page December 2018 article on the later match

DiogenesDue
Elroch wrote:

No, one of the Stockfish developers asserting the hash allocation had a big effect was not enough. Such an assertion needed to be demonstrated.

Since all that early discussion, there has been a second match between AlphaZero and Stockfish, with all of the possible disadvantages of Stockfish in the earlier match removed (time management, opening book, hash table, etc.). The result was that AlphaZero won almost as convincingly as in the first match, indicating that the total effect of the issues was not huge.

Preprint of 32 page December 2018 article on the later match

Another round of private testing, not a match at all.  You ask for proof in your first line that you don't have for your "side".  If a scientific organization came out and and said they had proven climate change is *not* occurring, but only proved it in private tests, and then other people supported them saying "scientists that support climate change need to show some more substantial proof to refute these new results", you'd laugh at those people.