How to resolve the AlphaZero vs Stockfish debate. - Chess Forums

Jan 2, 2018

0

#1

I think the on-going "AlphaZero vs Stockfish" debate highlights an implicit flaw in how we think about competing chess engines. The flaw is in the notion that there is a clean separation between a chess engine's software from its hardware. As AI continues to improve, I think this notion will become more and more ludicrous; the future of AI is going to be an end of generic-purpose hardware into something much more like the human brain (a weird blend where one cannot meaningfully differentiate "hardware" from "software"). In 2017, AlphaZero is running on TPUs designed by humans; and there's already much debate over comparing TPUs to CPUs. What will happen when AlphaZero 2.0 comes out in 2019, and runs by pumping electricity through a large amorphous glob of interwoven silicon molecules, yet produces moves that crush AlphaZero 1.0? How much "glob" would be "fair" for AlphaZero 2.0 to compete with the 4 TPUs of AlphaZero 1.0?

IMO, the "proper" way to compete two engines is to limit them on basic physics. Any given competitor is treated as a black-box:

some arrangement of matter (hardware+software)
connection to a power supply (no batteries allowed)
a simple interface through which chess moves can be communicated

Within this paradigm, each move (or the game as a whole) is limited by a single restricted value:

(Mass of engine, in kilograms) x (Electrical Power, in Watts) x (Time taken, in seconds)

A side note, since Watts = Joules/second, the calculation yields a value with units that simplify to kilogram-Joules (kgJ).

The basic concept, applied to conventional engines, is that power&mass will limit the overall CPU capacity (more CPUs = more power drawn to run & cool the machine) and opening book size (bigger book = more RAM = more mass).

How does this apply to the current Alpha Zero vs Stockfish debate?

For TCEC season 9, the competition was run with the following:

Season 9 Superfinal server
CPUs: 44 Cores -> 2 x Intel Xeon E5 2699 v4 @ 2.8 GHz
Motherboard: Supermicro X10DRL-i
RAM: 128 GB DDR4 ECC
SSD: Crucial CT250M500 240 GB
Chassis: Supermicro
OS: Windows Server 2012 R2

Say this beast was stripped down to the bare components needed to run Stockfish (cut out any skeletal components and needless ports, carve the motherboard down to essentials, etc). Furthermore, say the CPUs and RAM are replaced with the lightest-weight and lowest-power components commercially available. Once laid bare and optimized, say the system weighs in at 50 kg and is measured to average about 3,000 Watts power while making a move. This would amount to 50kg x 3000W x 60s = 900,000 kgJ (per 1 minute move). Of almost no consequence to this number, some of the RAM could be allocated to store a very deep database of opening moves.

At this point, the challenge would go out to AlphaZero to compete based on this 900,000 kgJ per move limit. So if Google can build the 4-TPU AlphaZero system, but that system draws 10,000 Watts of power and weighs 100 kg... then it would only be allowed 9 seconds per move. Conversely, Google may find it can squeeze 100 TPUs into 10 kg, which draws 1,000 W power, and thus entitles AlphaZero to use 15 minutes per move for every 1 minute stockfish gets (and of course, Google is also free to use any opening book of choice... however, since AlphaZero's neural-net configuration is kind-of-sort-of like a "highly-compressed opening book for every reachable position in the game"; Google may find that storing a rigid opening book is an inefficient use of resources).

An interesting corollary (i.e. human's chess-playing ability is AMAZING, even compared to Stockfish 8):

The human brain weighs about 1.5 kg. Furthermore, the act of "thinking really hard" is estimated to burn about 1.5 Calories per minute, which is about 0.1 W. This means if we treat the Magnus Carlsen "chess engine" as "the extra calories used to drive the Magnus' brain's thoughts on chess", we wind up with a human measurement of 9 kgJ per 1 minute move. Compared to StockFish's 900,000 kgJ per 1 minute move, this would mean that a "fair" test between Magnus Carlsen and Stockfish 8.0 would be Carlsen getting 1 hour per move vs Stockfish getting 0.03 seconds (three one-hundredths of a second) for each reply.

Jake_Paul7

Jan 7, 2018

0

#2

trollololololololololololol

outspider

Aug 17, 2018

0

#3

I think this is a very interesting point about the distinction between hardware and software. But could you possibly explain to someone like myself who is not a computer scientist why it is that a neural net runs on completely different hardware from a regular computer chess engine. Is it not just ones and zeros? I just don't get what's going on there and would like to know.

On your mass x power x thinking time restriction, I kind of get the point of that as a very general metric of 'playing power' and the comparison with Carlsen is interesting. Not sure its fair just to measure Carlsen's brain alone without the rest of him. Does that include the blood pumping around his brain, for example? Also why is mass important at all? That seems to me an unimportant restriction. Do you just not like people with big heads?! If you kept Carlsen's time to a reasonable human thinking time limit for competetive chess then I suppose it's sort of feasible. But in the end, if there's a machine that can play chess better than anything else but draws twice as much power no-one's going to baulk at that. They are going to want to see what it can do. And no-one will be interested in spending thousands of pounds trying to cut a machine's weight down for the purposes of competition. Although arguably it could drive technological development of they did, those developments are more likely to arise from applications in which weight actually matters (mobile phone, space satellites, etc.)

So currently chess engines have to run on certain hardware which will not support AlphaZero. Hence why AlphaZero is not in the international computer chess competition I guess. Thanks for helping me understand that! So your question really is how is it going to compete on an equal playing field with other chess engines? You've tried to extend that playing field to permit the participation of any conceivable chess playing entity, whilst also rigging it a bit, if I may so so to give humans a chance to compete. But as Kasporov mentions here, https://www.youtube.com/watch?v=Dj97gyu2vF4, the window in which the technology is playing at a similar level to humans was only about a decade long, so it may be impossible to attain that goal without fine-tuning restriction just for that purpose. I guess trying to level things up between neural nets and others would be similarly frustrating. Why not just accept that the competition has been blown out of the water by the new technology? I personally have no problem with it as a representative human. After all, people not that different from us invented the damn things! So what if no-one can beat them. It's a bit like trying to have a better vocabulary than a dictionary or something. Why bother? Anyway how that problem can be resolved, even so that different neural nets can compete against each other is an interesting problem. Have the people at Deep Mind already addressed it though? Is AlphaZero just the winner of a ruthless contest contest between different prototypes? I love the "amorphous glob of interwoven silicon molecules", by the way. I hope it wins.

To come back to the first point, why is information processing not a perfectly serviceable metric for computing performance, at least for now? I suppose you're saying that neural nets break down the distinction between information stored (like on my computer's hard drive) and information processing (like that processor thing does). Is that right? Also, what would make a neural net better. The parameters for AlphaZero seem so minimal from the DeepMind paper on it you wonder how it can possibly be improved.