Beyond Alpha Zero, where is neural networks reinforcement learning now?

Sort:
redghost101
It changes its strategy every game,depending on the results of the l ast one
DerekDHarvey

Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves. 

drmrboss
Estrinian wrote:

Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves. 

 

That can happen only if you are top 10 of world best memory contestant who can

1. memorize 5,000-10,000 positions in an hour. 

2. Spend a good time to search 800 positions before making every move

3. Live 1 million years of live, cos you need millions of games to improve.

 

The way how Leela learn is " she has to play a certain different amount of games in a batch like 30,000 games". Then keep record " 1. e4 e5 has better winning chance or 1. e4 e6. If it is the former one, choose that line and go down deeper, 1. e4 e5 2. d4 or 2. Nf3 etc. Then memorize all those positions with winning chance statistical data.

redghost101
A computer can do that because it can play a billion games in parallel
redghost101
Million, whatever
redghost101
Well, as an answer to your question, neural networks don’t want to replicate humans, because it knows strategy’s that most don’t. It will just try to play against itself to find a strategy that wins. It repeats it, and will make sure that it wins perfectly with that strategy. Now the other colour will try to find a strategy against it, and when it does, then the first would be finding a strategy to defeat blacks.
redghost101
It repeats with the neural network learning more and more, being able to beat anyone by finding the right outputs from inputs
redghost101
So ye
redghost101
Using these many strategies, both colours know how to reply to almost everything
harveyluke
redghost101 wrote:
Using these many strategies, both colours know how to reply to almost everything

As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.

drmrboss
harveyluke wrote:
redghost101 wrote:
Using these many strategies, both colours know how to reply to almost everything

As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.

At first , I was about to discuss his posts but later I realized that he is probably 8-12 year kid. So, I just leave them alone whatever he post. 

NikkiLikeChikki
As of last March, Leela has played something like 300 million games against herself. If I played a game every one second, it would only take me 13,700 years to play this many.
NikkiLikeChikki
Wait. I did the math wrong. Anyway, I’d have to take naps.
x-9140319185

I wonder if this can be applied to complex strategy games or at least some of the mechanics of it. Stellaris would be the ultimate achievement.

x-9140319185

If a neural network got to that point, I wonder what it would value. As there is no set goal to Stellaris other than to please the player, I would assume it would have to do with certain empire characteristics, such as exterminating all life or conquering the galaxy.