I would think the "nps is smaller" because for each position it is calculating, it is calculating some horrific value function which in fact encapsulates the results of examining millions if not billions of positions, whereas Stockfish's evaluation function is much much simpler.
Objectively Speaking, Is Magnus a Patzer Compared to StockFish and AlphaZero?
@SmyslovFan:
The paper contains the results of 1200 games in the 12 most common human openings. Note these openings can limit the strength because they force positions which doesn't fit to the engine's playing style and can be weaker than the version preferred by the engine.
AlphaZero has the most losses in the B40 Sicilian Defence: as white 17 wins, 31 draws and 2 loss; as black 3 wins, 40 draws and 7 losses.
@Lyudmil_Tsvetkov
> Actually, we don't know why its nps is smaller [...]
Sure we know: Monte Carlo Tree Search without simulations + network calculations. Nothing special there.
> on the 32 cores vs latest dev on 2000 core
Isn't the supported maximum 128 threads?
SF excels in the Sicilian.
Well, it is not much effort to increase the number of threads and improve SMP efficiency.

Here's an interesting article that gives information different from what Elroch has presented. It also includes the input of chess experts and a member of the Stockfish team:
https://en.chessbase.com/post/alpha-zero-comparing-orang-utans-and-apples

Do draw attention to anything specific in the article which you think disagrees with what I have said!
I am not sure what you are referring to, but there is at least one possible source of misunderstanding in the article, the statement that games were played against Stockfish "in the training phase". Games against Stockfish were only used for testing the system: there was no feedback from that experience into the system itself. There were 100 games played in each of several openings, and then the main 100 game match where no openings were specified and each program was free to play what it "liked". AlphaZero did all of its learning in self-play, partly completely unguided and partly starting from specific opening positions as examples.
[I believe I have been misleading in at least one way in what I have posted. Having read that AlphaZero used a single net for both evaluation of positions and assessments of moves, I assumed that it used multiple outputs. However, I realised it is more likely that the system uses a single output - the evaluation - and simply applies this to the position reached after every legal move in order to make a first comparison of the moves.
In chess the expected score is very similar to a win probability except for the possibility of a draw, which counts as half a win and half a loss. When comparing moves, I believed AlphaZero rated moves by probability of being best, but now I see it is makes a lot more sense to evaluate them by what is effectively the probability that they get to the best result (in fact simply the evaluation of the position they reach).
The exact details of the Monte Carlo Tree Search are not actually known to me, but I do know that the system plays a large number of complete games against itself, surely based on playing moves in a way closely related to their expected scores, and then averages all of the results. This is very, very different to the minimax evaluation and alpha-beta pruning used by conventional chess programs, and happens to be very similar to what I have used myself in application areas nothing to do with chess (or even games)].

I have no doubt that by playing around with configuration, etc, they might be able to tune SF to beat this round of AlphaZero. In fact, it seems possible that SF could be modified to use MCTS rather than alpha-beta if that is supposedly better.
But in the end, none of that would change the fundamental issue, which is that SF uses a human-created evaluation function based on advice from GMs, etc, while AlphaZero creates its own from scratch. If using enormous effort, they did train SF to beat this iteration of AZ, Google could just run the training program twice as long. The question would be whether SF could ever be tuned to beat an AZ that is as good as it ever gets using its training method.

Ok, key differences include
- how the article defines artificial intelligence compared to the definition you gave. From what I can tell, your definition is not a useful definition. The chessbase article infers Artificial Intelligence is mostly about the ability to adapt and "learn". This is a critical component of the definition of AI that I have found in computer science literature since the 1980s.
- The representation of Stockfish as significantly different and weaker in this match compared to your claim that the differences are minor. "Unfortunately, the comparison with Stockfish is misleading. The Stockfish program ran on a parallel hardware which is — if one understands Tore Romstad correctly — only of limited use to the program. It is not clear precisely how the hardware employed ought to be compared. The match was played without opening book and without endgame tablebases, which both are integral components of a program like Stockfish. The chosen time control is totally unusual, even nonsense, in chess — particularly in computer chess." (from the conclusion of the article cited above)
- the article's main thesis is that while the Alpha Zero program is a revolutionary step, the test itself was poorly designed, and not a fair measure of Alpha Zero's chess playing ability compared to a fully functioning Stockfish. From a practical perspective, this experiment will have no real effect on chess for the near future unless someone designs a program for affordable computer systems that uses a similar learning process. But that is not something currently being planned.
In other words, the tone, the definition of artificial intelligence, and your claim that Stockfish was operating at nearly its normal strength are all different.

But in the end, none of that would change the fundamental issue, which is that SF uses a human-created evaluation function based on advice from GMs, etc, while AlphaZero creates its own from scratch. If using enormous effort, they did train SF to beat this iteration of AZ, Google could just run the training program twice as long. The question would be whether SF could ever be tuned to beat an AZ that is as good as it ever gets using its training method.
Agreed.
Looking at the way AlphaZero pushed Stockfish around the board, I'd say that there are some noticeable flaws in our (human) understanding of chess and how we evaluate a position.
I'm no engine expert, but I'm guessing that the standard way of evaluating chess (using centipawns) is probably not how AlphaZero does it.
Stockfish is a better calculator than AlphaZero. It sees farther, calculates deeper. But it still fell into inferior positions against AlphaZero. Which means that it isn't about computing or calculating ability—it's about AZ having learned to value things that SF simply doesn't have in its programming.
Which means AZ has, apparently, discovered concepts and principles that human masters haven't yet figured out. Or at least, it's learned to put a greater emphasis on things that we tend to minimize, and lesser emphasis on things that we conventionally value more.

For reference, here is Elroch's definition of Artificial Intelligence. Try parsing it to see whether it makes any sense at all:
Consider it done.
Artificial intelligence is defined by computer scientists as the study of agents that observe an environment and interact with it in order to achieve some sort of goals.

Elsewhere, Elroch appears to understand AI the way Computer World defined it. But his definition is not a usable definition. For example, an agent is not defined, and in fact could be an organic being, not artificial at all. Also, artificial intelligence isn't the study of applied intelligence. These are semantic issues, but when talking about a definition, semantics are important.
Here's the link to Computer World's definition, which is far more useful, but also longer:
https://www.computerworld.com/article/2906336/.../what-is-artificial-intelligence.html

Elsewhere, Elroch appears to understand AI the way Computer World defined it. But his definition is not a usable definition. For example, an agent is not defined, and in fact could be an organic being, not artificial at all. Also, artificial intelligence isn't the study of applied intelligence. These are semantic issues, but when talking about a definition, semantics are important.
Here's the link to Computer World's definition, which is far more useful, but also longer:
https://www.computerworld.com/article/2906336/.../what-is-artificial-intelligence.html
My definition was for a human like me who would have an idea what an agent is.
All the definitions in any dictionary contain undefined terms. It is difficult to think that it may have escaped you that the terms are defined elsewhere, and that this does not invalidate a definition! Indeed the notion of a self-contained definition without reference to others is impossible.
Sorry, but you are simply wrong that Artificial Intelligence is not a branch of computer science as well as the range of topics that are studied by that branch of computer science. That is how the phrase is commonly used (the ultimate arbiter).

I repeat my request: Please cite your source.
Added: I really don't understand why you are doubling down on a demonstrably poor definition when you know you could give a better definition given a second try. And I have not found that definition anywhere in any of the literature I've read on artificial intelligence. I don't have a clue where you came up with it.

google provides a narrower definition:
"the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages."
I don't particularly like this, because it accidentally excludes the entire range of tasks that might be achievable by a non-human animal, for example maze-solving (or even seeking and catching food).

No, Elroch, the point of artificial intelligence is that it does exclude natural intelligence, whether it is human, animal, or alien.

It is a bit frustrating when I have to explain the fact that doing something can be done by a mouse does not mean you are a mouse. Robots can solve mazes too, you know. However, someone who demanded that a task require a human to do it would have to exclude maze-solving, as mice do it. They would also have to exclude being able to identify a field mouse from 100 meters away as a hawk can do that, and a huge list of other things that animals can do.
Anyhow, the original version of my abbreviated definition is from Artificial Intelligence: a Modern Approach, one of the most widely used textbooks on artificial intelligence. Take a peak at the preface here.
Chapter 1 looks at several different approaches to AI and synthesises them into a single whole. Here it refers to behaviour which requires intelligence when done by humans, which avoids the problem I mentioned of accidentally excluding things that can be done by non-humans. We might say it requires intelligence in a human to recognise something in the environment, even if an animal could also do it.
When it comes to weightlifting we are nothing vs a hydraulic machine. When it comes to calculating millions of patterns we are nothing versus the computer. Still proud to be a human - we make the machines!
A very good point!

I like that. I have often (slightly ironically) compared chess playing computer to cars in athletics.

So, does that mean that there exists a function F(position) = Value which does not depend on 20-deep evaluations of the possible moves from the position? If so, it would be kind of awesome if Google was to publish what that function was at the current state of AZ. I am sure it would not be human readable - and just as sure that humans would be able to pick key things out of it.
There is, and it is a number that you get out of the AlphaZero neural network when you provide a position as inputs. My guess is that by just doing the evaluation for the position after each legal move it may be capable of master level play (the data implies only that it is below 2800 without doing analysis). Unfortunately, this network consists of literally millions of parameters used in calculations at considerable depth, so not too easy for humans to unravel. Our brains are even worse at up to 10^15 connections, though.
Maybe not easy, but a very worthwhile exercise. And, if the function could be executed 80K times per second on the hardware that AlphaZero actually used to play, then at least it's not as insane as if it required the hardware that AlphaZero used to train.
@SmyslovFan:
The paper contains the results of 1200 games in the 12 most common human openings. Note these openings can limit the strength because they force positions which doesn't fit to the engine's playing style and can be weaker than the version preferred by the engine.
AlphaZero has the most losses in the B40 Sicilian Defence: as white 17 wins, 31 draws and 2 loss; as black 3 wins, 40 draws and 7 losses.
You have only just made me notice that AlphaZero had most trouble where my white repertoire meets my black repertoire. It would have less trouble if I was on the other side!
@Lyudmil_Tsvetkov
> Actually, we don't know why its nps is smaller [...]
Sure we know: Monte Carlo Tree Search without simulations + network calculations. Nothing special there.
> on the 32 cores vs latest dev on 2000 core
Isn't the supported maximum 128 threads?
I have never heard for sure of Stockfish running on any more powerful hardware. Although I found a reference to a claim of a Stockfish forked version running on a 4096 CPU cluster, the details are very hazy and I can't confirm it was genuine (not to mention it was supposed to be a fork, not the main version). Code that is designed for a single machine may need a lot of modification to run on a more loosely coupled cluster: communication can be a real bottleneck.