Beyond Alpha Zero, where is neural networks reinforcement learning now? - Chess Forums - Page 2

redghost101

Jun 17, 2020

0

#21

It changes its strategy every game,depending on the results of the l ast one

DerekDHarvey

Jun 17, 2020

0

#22

Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves.

drmrboss

Jun 17, 2020

0

#23

Estrinian wrote:

Play a million games with yourself like AlphaZero. Be happy with small improvements rather than grand attacks. Welcome opposite coloured Bishop endings and opposite side castling. Safeguard your King against checks even at the cost of a tempo. Play prophylatic moves.

That can happen only if you are top 10 of world best memory contestant who can

1. memorize 5,000-10,000 positions in an hour.

2. Spend a good time to search 800 positions before making every move

3. Live 1 million years of live, cos you need millions of games to improve.

The way how Leela learn is " she has to play a certain different amount of games in a batch like 30,000 games". Then keep record " 1. e4 e5 has better winning chance or 1. e4 e6. If it is the former one, choose that line and go down deeper, 1. e4 e5 2. d4 or 2. Nf3 etc. Then memorize all those positions with winning chance statistical data.

redghost101

Jun 18, 2020

0

#24

A computer can do that because it can play a billion games in parallel

redghost101

Jun 18, 2020

0

#25

Million, whatever

redghost101

Jun 22, 2020

0

#26

Well, as an answer to your question, neural networks don’t want to replicate humans, because it knows strategy’s that most don’t. It will just try to play against itself to find a strategy that wins. It repeats it, and will make sure that it wins perfectly with that strategy. Now the other colour will try to find a strategy against it, and when it does, then the first would be finding a strategy to defeat blacks.

redghost101

Jun 22, 2020

0

#27

It repeats with the neural network learning more and more, being able to beat anyone by finding the right outputs from inputs

redghost101

Jun 22, 2020

0

#28

So ye

redghost101

Jul 10, 2020

0

#29

Using these many strategies, both colours know how to reply to almost everything

harveyluke

Sep 10, 2020

0

#30

redghost101 wrote:

Using these many strategies, both colours know how to reply to almost everything

As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.

drmrboss

Sep 10, 2020

0

#31

harveyluke wrote:

redghost101 wrote:

Using these many strategies, both colours know how to reply to almost everything

As a reinforcement learning student, everything you've said is pretty far off how AZ works (e.g it doesn't change its strategy every game, it has a predefined net with only slight variations), you really need to read up how AlphaZero actually works, it's pretty fascinating. Reinforcement Learning doesn't 'try every option', this project picks a move based on the policy net's probability distribution of moves, and it learns from that based on the outcome of the game training the value net to learn when it is winning or losing. It is 'learning', as if it discovers a new strategy, future neural networks are trained via supervised learning to have this strategy whilst shifting the weights to try new options, and the fact it's discovered strategies humans use and then discarded them for some we haven't even tried, shows that it's learning.

At first , I was about to discuss his posts but later I realized that he is probably 8-12 year kid. So, I just leave them alone whatever he post.

NikkiLikeChikki

Sep 10, 2020

0

#32

As of last March, Leela has played something like 300 million games against herself. If I played a game every one second, it would only take me 13,700 years to play this many.

NikkiLikeChikki

Sep 10, 2020

0

#33

Wait. I did the math wrong. Anyway, I’d have to take naps.

x-9140319185

Sep 10, 2020

0

#34

I wonder if this can be applied to complex strategy games or at least some of the mechanics of it. Stellaris would be the ultimate achievement.

x-9140319185

Sep 10, 2020

0

#35

If a neural network got to that point, I wonder what it would value. As there is no set goal to Stellaris other than to please the player, I would assume it would have to do with certain empire characteristics, such as exterminating all life or conquering the galaxy.