Beyond Alpha Zero, where is neural networks reinforcement learning now?


Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves.

Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves.
That can happen only if you are top 10 of world best memory contestant who can
1. memorize 5,000-10,000 positions in an hour.
2. Spend a good time to search 800 positions before making every move
3. Live 1 million years of live, cos you need millions of games to improve.
The way how Leela learn is " she has to play a certain different amount of games in a batch like 30,000 games". Then keep record " 1. e4 e5 has better winning chance or 1. e4 e6. If it is the former one, choose that line and go down deeper, 1. e4 e5 2. d4 or 2. Nf3 etc. Then memorize all those positions with winning chance statistical data.



As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.

As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.
At first , I was about to discuss his posts but later I realized that he is probably 8-12 year kid. So, I just leave them alone whatever he post.
