Also
I don't really understand. After a sufficient amount of learning, shouldn't it play the exact same opening every time?
I'm having a hard time understanding how it could prefer 1.d4 sometimes and 1.c4 other times, etc.
Also
I don't really understand. After a sufficient amount of learning, shouldn't it play the exact same opening every time?
I'm having a hard time understanding how it could prefer 1.d4 sometimes and 1.c4 other times, etc.
I'm guessing that there are 4 or 5 first moves that have equal chance of winning and it randomly selects between them. I don't think there is a "best" first move
The actual processor time spent to think about the moves would have varied slightly, even if the time on the wall clock is one minute exactly per move. The programs are somewhat at the mercy of the operating system's process scheduling. Is this difference enough to matter in general? I have no idea.
Another part of it relates to how AlphaZero works. Here's how I understand it:
Basically AlphaZero uses a neural network to statically evaluate positions to determine which moves look best. Then it picks a move, plays it internally, and repeats the process. This happens until AlphaZero reaches the end of a game (it has a number of turns per game cutoff; don't have the published paper handy and don't remember the length). AlphaZero will then have a vague idea if it will win, draw, or lose a game using the first move picked. It will go back to the current position and repeat the process of playing through a game again. Each time it tries a move its understanding of the result (win, draw, lose) becomes a little more certain. This happens as many times as AlphaZero can within the allowed search time.
Here's the important part: When AlphaZero is internally playing through games It picks the move using a randomization procedure that favors moves that have better evaluations. Note that the moves do not have equal probability of selection - moves with a more favorable evaluation have a better chance of being searched, but all moves at least have a chance of selection. Also, AlphaZero tracks moves that it has already tried so it can reduce the probability of selecting a move that it has already tried several times. This forces the program to search down different move paths. After a little while, AlphaZero will likely have played through several moves many times. It knows the win/loss/draw stats and can somehow use this information to decide on the final move to actually play. This method of searching through the positions is called Monte Carlo tree search, in case you want to look it up.
I can't think of anything else off-hand that would make the engines pick different moves. If Stockfish had its opening book on I would say it just picked a different opening from it. But the opening book was turned off so that can't be it.
I didn't realize Stockfish had some random elements built into its move selection, but apparently it does!
This is from Stockfish's source code (in the search.cpp file; function "pick_best"):
// Choose best move. For each move score we add two terms, both dependent on
// weakness. One is deterministic and bigger for weaker levels, and one is
// random. Then we choose the move with the resulting highest score.
for (size_t i = 0; i < multiPV; ++i)
{
// This is our magic formula
int push = ( weakness * int(topScore - rootMoves[i].score)
+ delta * (rng.rand<unsigned>() % weakness)) / 128;
if (rootMoves[i].score + push > maxScore)
{
maxScore = rootMoves[i].score + push;
best = rootMoves[i].pv[0];
}
}
why don't the games repeat? For instance, every time A0 is white, doesn't it start with the same opening move? And doesn't stockfish always respond accordingly, and so on? How is there variation between games?