Forums

Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?

Sort:
Elroch

One piece of information that I have not seen is the architecture of AlphaZero's neural network. It is surely large and deep, but it would be interesting to learn the details.

admkoz

Lots of things I'd like to see. 1) How long does AZ keep getting better?  Is that a function of the learning hardware, or is there a theoretical upper limit (or at least a severe levelling off - where you let it "learn" for an extra week, it examines an extra 5 trillion positions, and gets 0.00001% better).  If there is a theoretical upper limit, does there exist an architecture that can beat THAT?

 

If SF complains that it would have done better with its opening book, and to take advantage of its time management capabilities, then let it! See what kind of game follows.  Heck, let Stockfish "learn" for 4 hours before the match too (if SF is capable of storing the results of that sort of thing).  

Elroch
admkoz wrote:

Lots of things I'd like to see. 1) How long does AZ keep getting better?

They trained it for each of the three games with a fixed amount of data - 700,000 minibatches of 4096 positions, and with chess, its rating did not rise significantly after the first 400,000 or so.

Is that a function of the learning hardware, or is there a theoretical upper limit (or at least a severe levelling off - where you let it "learn" for an extra week, it examines an extra 5 trillion positions, and gets 0.00001% better).  If there is a theoretical upper limit, does there exist an architecture that can beat THAT?

Probably a feature of the size of the neural network. Bigger networks can specialise more, so benefit from more data.

If SF complains that it would have done better with its opening book, and to take advantage of its time management capabilities, then let it! See what kind of game follows.  Heck, let Stockfish "learn" for 4 hours before the match too (if SF is capable of storing the results of that sort of thing).  

Stockfish's learning is more the learning of its authors! It has no learning algorithm.

 

Godeka
admkoz wrote:

I have no doubt that by playing around with configuration, etc, they might be able to tune SF to beat this round of AlphaZero.  In fact, it seems possible that SF could be modified to use MCTS rather than alpha-beta if that is supposedly better.

The result would be that SF is much weaker. MCTS uses random playouts to evaluate which player wins. More playouts result in a better average of an node, but even than the alpha-beta-pruning is much better.

In Go the strongest engines which use alpha-beta search (or very similar algorithms) reached 5k which is about 1650 Elo. Then the MCTS engines appeared reaching amateur 2d level which is about 2100 Elo. Both levels seems to be a boundary that cannot be exceeded without changing the method of evalutation.

Maybe you know that evaluating the current position in Go is really hard. MCTS works better than alpha-beta not because it is better than alpha-beta – it's because alpha-beta almost didn't work for Go. But in chess a position can be evaluated very easy (counting peaces is the easiest way) and alpha-beta works great.

To make Go engines stronger you need a better way to evaluate positions and to select good candidates for the next move. This is the point where neuronal networks come into play: a policy network that suggests candidates and a value network that gives the winning probability of a position. (Note that DeepMind optimized that. AlphaZero only uses one neuronal network that outputs both.)

Still MCTS is used during calculation (you can find explanations how it works on Wikipedia or on other pages, it is a common algorithm and nothing new or special that DeepMind has invented). Some years ago there where engines that used a policy network and the evaluation value of the MCTS random playouts; stronger engines than combined random playouts with a value network; and AlphaZero has a value network of such a high quality that random playouts are not necessary anymore (they are necessary for learning, not for playing with a finished nework).

And yes, you are right, it is possible to use MCTS in chess, but to get a strong engine you must combine it with neuronal networks and … Wait … That's what DeepMind did! Amazing. grin.png

 

 

@Lyudmil_Tsvetkov
> SF excels in the Sicilian.

Not really. AZ had 17 wins with white, SF had 7 wins with white. AZ had 3 wins with black, SF had 2 wins with black. It seems both engines have problems to win as black. Maybe this is an issue in the engines, or – more likely – the opening is bad for black.

Elroch
Godeka wrote:
admkoz wrote:

I have no doubt that by playing around with configuration, etc, they might be able to tune SF to beat this round of AlphaZero.  In fact, it seems possible that SF could be modified to use MCTS rather than alpha-beta if that is supposedly better.

The result would be that SF is much weaker. MCTS uses random playouts to evaluate which player wins. More playouts result in a better average of an node, but even than the alpha-beta-pruning is much better.

The slight problem with this argument is that AlphaZero used MCTS and is the strongest player, despite calculating more than 1000 times fewer nodes than Stockfish! This is a very strong reason to doubt the conventional wisdom.

In Go the strongest engines which use alpha-beta search (or very similar algorithms) reached 5k which is about 1650 Elo. Then the MCTS engines appeared reaching amateur 2d level which is about 2100 Elo. Both levels seems to be a boundary that cannot be exceeded without changing the method of evalutation.

Maybe you know that evaluating the current position in Go is really hard. MCTS works better than alpha-beta not because it is better than alpha-beta – it's because alpha-beta almost didn't work for Go. But in chess a position can be evaluated very easy (counting peaces is the easiest way) and alpha-beta works great.

Not great enough for Stockfish to have a chance against a program that was delightfully anti-materialistic in several games. Persistent, winning positional advantage at a material cost was a theme of the most striking AlphaZero wins.

To make Go engines stronger you need a better way to evaluate positions and to select good candidates for the next move. This is the point where neuronal networks come into play: a policy network that suggests candidates and a value network that gives the winning probability of a position. (Note that DeepMind optimized that. AlphaZero only uses one neuronal network that outputs both.)

Still MCTS is used during calculation (you can find explanations how it works on Wikipedia or on other pages, it is a common algorithm and nothing new or special that DeepMind has invented). Some years ago there where engines that used a policy network and the evaluation value of the MCTS random playouts; stronger engines than combined random playouts with a value network; and AlphaZero has a value network of such a high quality that random playouts are not necessary anymore (they are necessary for learning, not for playing with a finished nework).

Sorry, not true. AlphaZero was below 2800 with 1/30 second per move, at which level it would have time for about 2700 MCTS branches per move

And yes, you are right, it is possible to use MCTS in chess, but to get a strong engine you must combine it with neuronal networks and … Wait … That's what DeepMind did! Amazing.

 

 

@Lyudmil_Tsvetkov
> SF excels in the Sicilian.

Not really. AZ had 17 wins with white, SF had 7 wins with white. AZ had 3 wins with black, SF had 2 wins with black. It seems both engines have problems to win as black. Maybe this is an issue in the engines, or – more likely – the opening is bad for black.

 The question of why AlphaZero was stronger with MCTS compared with alpha-beta search and this goes against the conventional wisdom for chess engines that use different evaluation functions is an interesting puzzle.

admkoz

So you are saying that MCTS only works for AlphaZero because its value function is specifically good for being used with that algorithm, and that it wouldn't work well with a different value function?  

 

What I mean by let SF learn is let it sit there and evaluate positions for 4 hours starting from scratch. In the end that's what AZ did even though it compressed the evaluation of the positions into an evaluation function (as I understand it).  It seems like it's the mechanics of that compression that is the real breakthrough with AZ if there is one and it wasn't just perfecting existing techniques. 

Elroch

I am saying I do not know. For some reason MCTS works better for AlphaZero and alpha-beta works better for engines like Stockfish. Ignoring the way they are created, the most striking difference is that the evaluation function of AlphaZero is enormously more computationally demanding and that it calculates 1000 times fewer branches, but it is not clear why this would swap which look-ahead method is better.

[Bear in mind that both engines value a position with a single number and compare these. For conventional engines this is usually expressed in centipawns, revealing their assumptions about material value, for AlphaZero it is the more fundamental concept of the expected score (i.e. probability of a win against best play with a draw counting as half a win].

Godeka

 @Elroch:
> The slight problem with this argument is that AlphaZero used
> MCTS and is the strongest player

But AZ is no MCTS only engine, it combines it with NN.

 

> Sorry, not true. AlphaZero was below 2800 with 1/30 second
> per move, at which level it would have time for about 2700
> MCTS branches per move

Which part isn't true? That random playouts are not necessary anymore?

https://deepmind.com/blog/alphago-zero-learning-scratch/

  • It [AlphaGo Zero] also differs from previous versions in other notable ways.
    [...]
    - AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.

I don't think that was changed in AlphaZero. And I think I saw in an interview that playouts (aka simulations, aka rollouts) were not used anymore for playing. But maybe I understand something wrong.


@admkoz:
Not exactly. I say alpha-beta-pruning is already very good for chess, better than a pure MCTS engine. In Go alpha-beta-pruning cannot be used because of a missing evaluation function, so a substitute for alpha-beta was needed that works for Go.

And why not combine alpha-beta with an evalutation network? I am not an expert, I have an overview but no very detailed knowledge. I assume it is because alpha-beta only works if you are able to do a huge number of calculations in short time, but having a NN to evaluate each position is slow, so this combination doesn't work very well.

 

> What I mean by let SF learn is let it sit there and evaluate positions

> for 4 hours starting from scratch.

But than you need something were you can store the result, and it must be used to evaluate future positions, otherwise you have no learning effect. At the end you need a NN (or a revolutionary idea for something else).

You cannot have both. Either you are using a function (that is more like a human created formular) or you have a black box with a lot of neurons which can be adjusted so that the input results into a better output. In the first case you know what happens inside your function, you can adjust and test it. In the second case you have a black box in which some magic happens. It needs a lot of time to adjust the weights of the neurons to get a good output, and if you want to change something (number of neurons, number of layers, different learning rate, different activation function and so on), than you must start from the beginning.

Artificial neuronal networks are nothing new, they were researched some decades ago. The main problem was: the hardware was to slow to train and use NN that can be used to solve real world tasks. Even today it is not easy to create the NN structure, train it, modify its structure and train it again to check if it learned better. Its more easy if you have some million dollars and a computing centre. (Google had used human played games from Go servers to train its first AlphaGo versions and needed much more CPUs and GPUs.)

At least today the average PC is fast enough to use a trained network. This results in strong Go engines in the last years.

admkoz

I agree there would be no learning effect.  It would just be a four hour headstart on calculations.  Whether SF is even capable of storing the results of 4 hours of pondering, I don't know.  But  it seems like that would be a fair comparison.  

 

As far as how the network was "trained" - is that also published knowledge or is that specific to AZ? 

 

Elroch
Godeka wrote:

 @Elroch:
> The slight problem with this argument is that AlphaZero used
> MCTS and is the strongest player

But AZ is no MCTS only engine, it combines it with NN.

The NN just gives a number, an evaluation of a position. Consider it a black box calculation like a very good guess as to how good the position is.

> Sorry, not true. AlphaZero was below 2800 with 1/30 second
> per move, at which level it would have time for about 2700
> MCTS branches per move

Which part isn't true? That random playouts are not necessary anymore?

Yes, you said " AlphaZero has a value network of such a high quality that random playouts are not necessary anymore" This is wrong (unless you are happy with play that could be beaten by a lot of professional humans).

https://deepmind.com/blog/alphago-zero-learning-scratch/

  • It [AlphaGo Zero] also differs from previous versions in other notable ways.
    [...]
    - AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.

Fortunately, I can clear this up. The AlphaZero paper says "Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf."

I don't think that was changed in AlphaZero. And I think I saw in an interview that playouts (aka simulations, aka rollouts) were not used anymore for playing. But maybe I understand something wrong.


@admkoz:
Not exactly. I say alpha-beta-pruning is already very good for chess, better than a pure MCTS engine. In Go alpha-beta-pruning cannot be used because of a missing evaluation function, so a substitute for alpha-beta was needed that works for Go.

And why not combine alpha-beta with an evalutation network? I am not an expert, I have an overview but no very detailed knowledge. I assume it is because alpha-beta only works if you are able to do a huge number of calculations in short time, but having a NN to evaluate each position is slow, so this combination doesn't work very well.

I believe it is not that alpha-beta is bad, it is that MCTS works better. 

> What I mean by let SF learn is let it sit there and evaluate positions

> for 4 hours starting from scratch.

But than you need something were you can store the result, and it must be used to evaluate future positions, otherwise you have no learning effect. At the end you need a NN (or a revolutionary idea for something else).

You cannot have both. Either you are using a function (that is more like a human created formular) or you have a black box with a lot of neurons which can be adjusted so that the input results into a better output. In the first case you know what happens inside your function, you can adjust and test it. In the second case you have a black box in which some magic happens. It needs a lot of time to adjust the weights of the neurons to get a good output, and if you want to change something (number of neurons, number of layers, different learning rate, different activation function and so on), than you must start from the beginning.

True,

Artificial neuronal networks are nothing new, they were researched some decades ago. The main problem was: the hardware was to slow to train and use NN that can be used to solve real world tasks. Even today it is not easy to create the NN structure, train it, modify its structure and train it again to check if it learned better. Its more easy if you have some million dollars and a computing centre. (Google had used human played games from Go servers to train its first AlphaGo versions and needed much more CPUs and GPUs.)

All true.

At least today the average PC is fast enough to use a trained network. This results in strong Go engines in the last years.

But, if AlphaZero only handles 80,000 nodes per second, an average PC would be very limited, like AlphaZero with a small fraction of a second.

A key point I made was that the neural network of AlphaZero is no more than a positional evaluation function that it has learnt, targeting expected result. It is far more complex and probably substantially better than the less computationally demanding evaluation routines used by conventional engines.

However, it is surprising that a different way of exploring future possibilities would be better for two evaluation routines that each compare positions on an ordinal scale.

But I have suddenly realised a good reason why this may be so. AlphaZero's evaluation is correctly probabilistic: it gives an expectation of the score. This means it makes a lot more sense to combine such evaluations by averaging, as long as you have an unbiased sample of what may be the future best line.

By contrast, Stockfish has some sort of pseudo-material evaluation of positions. It makes a lot less sense to average these, because what matters is your chance of winning, not the expected amount of material you are ahead.

Let me give an exaggerated example. Suppose Stockfish thinks three lines are equally likely and evaluates them as 19 pawns ahead, -2 pawns, and -2 pawns. If it averaged these, It find it expects to be 5 pawns ahead and would be happy. By contrast, if AlphaZero sees three possibilities, with expected scores 0.99, 0.05 and 0.04, it is less optimistic with an expected score of 0.36 points.

Whether this is crucial with large numbers of lines is unclear, but it could be.

SmyslovFan
Elroch wrote:

It is a bit frustrating when I have to explain the fact that doing something can be done by a mouse does not mean you are a mouse. Robots can solve mazes too, you know. However, someone who demanded that a task require a human to do it would have to exclude maze-solving, as mice do it. They would also have to exclude being able to identify a field mouse from 100 meters away as a hawk can do that, and a huge list of other things that animals can do.

Anyhow, the original version of my abbreviated definition is from Artificial Intelligence: a Modern Approach, one of the most widely used textbooks on artificial intelligence. Take a peak at the preface here. 

Chapter 1 looks at several different approaches to AI and synthesises them into a single whole. Here it refers to behaviour which requires intelligence when done by humans, which avoids the problem I mentioned of accidentally excluding things that can be done by non-humans. We might say it requires intelligence in a human to recognise something in the environment, even if an animal could also do it.

Thank you for citing your source. 

I stand by my objection that AI is not the study of AI, and accept that I am criticizing the authors of a text on AI. The authors were writing about the study of AI, so their definition makes sense in that context. But even so, their definition isn't a clean, efficient definition. The equivalent in history is to define historiography in a text book on historiography, but to claim that it is a definition of history. The difference is a minor one for most, but a specialist would see the error. 

 

I have read the complete definition in context. The authors spend quite a bit of space defining agency, which you did not. Their definition is usable, but cumbersome in general discussions. They focus on agency for specific reasons laid out in the rest of the text. I stand by my assessment that your abbreviated definition was not usable, mostly because you did not explain what you meant by agency.

I still prefer Computer World's definition, which harkens back to the original definitions used by Turing and other pioneers of AI. That article strives to answer the question for an educated lay person, not a specialist.  But I accept that as long as agency ("agent" in the definition) is fully defined, the definition offered in the book also works. 

 

But you seem even to disagree with them about agency. You seem to want any non-human to be considered Artificial, even if it's an animal.

 

Meanwhile, you seem to not realize how fine the margins are in competitive chess. If Stockfish is operating at even 1% less than perfect efficiency, it will lose significant rating performance. Chess is a very exact game, which is why it's useful for AI to test itself by playing chess. Stockfish was significantly hampered. The creators of AlphaZero may have had their reasons, but it doesn't change that basic fact.

ponz111

Regardless if the conditions were fair or not--Looking at the 10 games published--Alpha Zero played beautiful and extremely strong chess.

I was and am just amazed at the quality of the play of Alpha Zero!

FideiDefensor

IS [TRACK STAR] A PATZER COMPARED TO [AUTOMOBILE]?

 

/absoluteretard

admkoz
Elroch wrote:

 

However, it is surprising that a different way of exploring future possibilities would be better for two evaluation routines that each compare positions on an ordinal scale.

But I have suddenly realised a good reason why this may be so. AlphaZero's evaluation is correctly probabilistic: it gives an expectation of the score. This means it makes a lot more sense to combine such evaluations by averaging, as long as you have an unbiased sample of what may be the future best line.

By contrast, Stockfish has some sort of pseudo-material evaluation of positions. It makes a lot less sense to average these, because what matters is your chance of winning, not the expected amount of material you are ahead.

 

That makes sense, but it seems like SF could be modified, without changing its essence. to work the same way.  

 

I was wondering if another reason is that AZ's algorithm is trained using MCTS.  So, the fact that it will be used in that context is somehow "baked in the cake".  I.e. it returns a number saying this is the probability that you'll win from here if you use MCTS.

 

Anyway, like I said, what I really want is for google to publish the function.  I have no doubt it would be super complicated, but just like other complex functions in e.g. physics, its behavior could be studied.  You should be able to feed in extreme cases like dropping a queen and find the "terms" that represent that.  

Elroch
SmyslovFan wrote:
Elroch wrote:

It is a bit frustrating when I have to explain the fact that doing something can be done by a mouse does not mean you are a mouse. Robots can solve mazes too, you know. However, someone who demanded that a task require a human to do it would have to exclude maze-solving, as mice do it. They would also have to exclude being able to identify a field mouse from 100 meters away as a hawk can do that, and a huge list of other things that animals can do.

Anyhow, the original version of my abbreviated definition is from Artificial Intelligence: a Modern Approach, one of the most widely used textbooks on artificial intelligence. Take a peak at the preface here. 

Chapter 1 looks at several different approaches to AI and synthesises them into a single whole. Here it refers to behaviour which requires intelligence when done by humans, which avoids the problem I mentioned of accidentally excluding things that can be done by non-humans. We might say it requires intelligence in a human to recognise something in the environment, even if an animal could also do it.

Thank you for citing your source. 

I stand by my objection that AI is not the study of AI, and accept that I am criticizing the authors of a text on AI. The authors were writing about the study of AI, so their definition makes sense in that context. But even so, their definition isn't a clean, efficient definition. The equivalent in history is to define historiography in a text book on historiography, but to claim that it is a definition of history. The difference is a minor one for most, but a specialist would see the error.

The error appears to be yours, since historiography is the study of the study of history. The study of history is just called "history".

What matters is that people understand what they are talking about and you have not given a single real example of where the almost universal practice of omitting "the study of ..." when referring to someone's subject is misleading.

I have read the complete definition in context. The authors spend quite a bit of space defining agency, which you did not.

Yes, they wrote over 1000 pages more than me, for which they deserve credit.

Their definition is usable, but cumbersome in general discussions. They focus on agency for specific reasons laid out in the rest of the text. I stand by my assessment that your abbreviated definition was not usable, mostly because you did not explain what you meant by agency.

The only context I can think of the notion of agent getting in the way is in the area of "hive intelligence", the intelligence of large numbers of semi-autonomous units co-operating to produce something far more intelligent in behaviour than the units.

I still prefer Computer World's definition, which harkens back to the original definitions used by Turing and other pioneers of AI. That article strives to answer the question for an educated lay person, not a specialist.  But I accept that as long as agency ("agent" in the definition) is fully defined, the definition offered in the book also works. 

Good.

But you seem even to disagree with them about agency. You seem to want any non-human to be considered Artificial, even if it's an animal.

No. I already explained this. I consider the successful emulation of animal intelligence artificial intelligence. 

Meanwhile, you seem to not realize how fine the margins are in competitive chess. If Stockfish is operating at even 1% less than perfect efficiency,

That is a meaningless statistic without definition. (Indeed for natural meanings relating to computational resources, it is wrong).

it will lose significant rating performance. Chess is a very exact game, which is why it's useful for AI to test itself by playing chess. Stockfish was significantly hampered. The creators of AlphaZero may have had their reasons, but it doesn't change that basic fact.

This does not make any sense. All chess engines come from decades of testing of chess engines, including their direct predecessors and development code. The unusual thing about AlphaZero was that it was not tested on any opponents until it was finished.

 

Elroch
admkoz wrote:
Elroch wrote:

 

However, it is surprising that a different way of exploring future possibilities would be better for two evaluation routines that each compare positions on an ordinal scale.

But I have suddenly realised a good reason why this may be so. AlphaZero's evaluation is correctly probabilistic: it gives an expectation of the score. This means it makes a lot more sense to combine such evaluations by averaging, as long as you have an unbiased sample of what may be the future best line.

By contrast, Stockfish has some sort of pseudo-material evaluation of positions. It makes a lot less sense to average these, because what matters is your chance of winning, not the expected amount of material you are ahead.

 

That makes sense, but it seems like SF could be modified, without changing its essence. to work the same way.  

I thought about that, but it would be a huge thing. The evaluation routine is based on material modified by positional factors. How can you get a probability of winning without machine learning (or playing sample games to their end)? You could write code return probabilities, but why would you believe them?

I was wondering if another reason is that AZ's algorithm is trained using MCTS.  So, the fact that it will be used in that context is somehow "baked in the cake".  I.e. it returns a number saying this is the probability that you'll win from here if you use MCTS.

Not quite sure what you mean. 

Anyway, like I said, what I really want is for google to publish the function.  I have no doubt it would be super complicated, but just like other complex functions in e.g. physics, its behavior could be studied.  You should be able to feed in extreme cases like dropping a queen and find the "terms" that represent that. 

Maybe, but it could be hellishly difficult! I suspect the network was quite big, millions of connections for sure.

admkoz

The question that I would have is whether just because it was originally "trained" on a neural network, that would be the only way it could be evaluated.  Can neural networks automatically simplify themselves to the greatest extent possible? (this is where I wish I had taken more of these classes in college)..)

Godeka

@Elroch
I read your citation about the simulated games too. Now I am a little bit confused because I am sure that there were no playouts anymore. Or I thought to be sure. happy.png Um, well … And I think it is consequent to leave them out if you have a good evaluation.

   But there is nothing to discuss, at least your citatation is clear. And I found no other information in the paper about omitting the simulation phase.

 

> But, if AlphaZero only handles 80,000 nodes per second, an
> average PC would be very limited, like AlphaZero with a small
> fraction of a second.

As said before it works very well for Go. Here are two benchmark results from Leela on a Geforce GTX1080 and on a i7-7700K@4.2GHz:

  • 2000 predictions in 3.34 seconds -> 598 p/s
    10000 evaluations in 4.20 seconds -> 2380 p/s
  • 2000 predictions in 16.53 seconds -> 120 p/s
    10000 evaluations in 34.45 seconds -> 290 p/s

Even with the low CPU numbers Leela plays very strong – although not strong enough to win against the strongest amateur and professional players.

   But I doubt that AlphaZero is better than SF on a PC. I assume that NN are not that good for chess then they are for Go, and even if they are then classic chess engines are too strong on an average PC. At least this is what I guess, I still would like to see some games.

 

Here are by the way the numbers of simulations Leela can make on my PC, maybe it is interesting for you (but note that Leela is a Go engine, not a chess engine):

  • 200000 games in 8.47 seconds -> 23612 g/s

 

> By contrast, Stockfish has some sort of pseudo-material evaluation
> of positions.

It is still material oriented, although highly optimized and taking positions into account. Despite of that sacrifices and positional advantages can be an issue, but it needs a strong player to reveal the weakness. That's exactly what happened in the games against AlphaZero. At least in the ten known games AZ pushs SF around and leaves SF behind in bad positions.

   Your statement about the averages of probabilities and the amount of material makes sense to me.

 

@admkoz:
There were enough information to rebuild AlphaGo Zero (see Leela Zero ), but the trained NN are not published by DeepMind. I don't know if the information about AlphaZero are sufficient to rebuild it, but I am sure that the trained NN are kept private. (I wonder why, there are no secrets in it which can be turned into money. But surely DeepMind has good reasons for that.)

 

admkoz

I also never quite get the "probability" part of it.  I guess it's a question that can only be answered empirically and empirically, AZ won, so... But I can definitely think of positions where any type of probabilistic analysis would be a fail.  Let's say I hang my queen.  He can take the free Q, but if he doesn't, I have mate in 1.  So if you "average" over all his moves, this'll look pretty good since for all but one of them, you win.  Sadly that one is the one he'll pick, so you lose.  

 

Another thing I would really, really love to see: the first few random games.  In the 40ms of thinking time, did it look ahead, see a non-zero chance of scholar's mate if 1 e3 or e4, so play that?  Or did it just go totally random with 1 a3?  

Elroch
admkoz wrote:

The question that I would have is whether just because it was originally "trained" on a neural network, that would be the only way it could be evaluated.  Can neural networks automatically simplify themselves to the greatest extent possible? (this is where I wish I had taken more of these classes in college)..)

I think there's a pretty simple answer there. No.

If you train a network with 5 layers, 10,000 nodes, and 10,000,000 parameters, it is going to end up with 5 layers, 10,000 nodes and 10,000,000 parameters.

However, I suppose one way they can be encouraged to simplify themselves to some extent is to use L1-regularisation, which tends to force some parameters to zero (so they can be omitted). But this is not really the sort of simplfication you mean. It is a sort of prior preference for (somewhat) simpler, sparser models.