@admkoz:
The games are not played completely random. They are played randomly from a specific position to the end. Assume to have K-Q vs K and move the queen on a field adjacent to the opponent’s king. Play randomly 800 games and you get a number of wins, losses and draws.
Repeat it, but this time move the king instead of the queen or move the queen on a field not adjacent to the opponents king. After playing 800 games randomly to the end, you should get more wins and draws than before, so the selected move must be better.
Searching a mate would mean, that you must use brute-force which only works in late endgame. You would need rules to know when brute-force can be used, or you need rules to select moves, or to prune branches, and to evaluate positions. That doesn’t make sense for a NN that should learn without human input, and it is likely that human input weakens the network, even if it is possible that it learns faster in the beginning.
@Legeco:
It does the same mistake again and again, maybe for some 100.000 games. The weights of a NN are adjusted slowly. It is also possible that the NN learns that it is good to play knights to the edges or into corners, because it had success with it. It can take a long time until it recognises that there are better moves.
I just cannot believe that that strategy would come up with a world-champion AI in 44 million games. You'd play 100K games and you'd barely know how to mate with K-Q vs K.
I've gone blind reading this thread! Which is a good thing, I now can't read "solve chess" anymore.