Chess will never be solved, here's why

Sort:
playerafar
MARattigan wrote:

@playerafar

Either of your positions in #2566 could be either player to move.

The tablebases do contain illegal positions and the number of these is unknown, but in most cases appears to be negligible, probably smaller than the number of omitted positions with castling rights.

Determining legality is an as yet unsolved problem. 

This would obviously represent a potential source of error with the extrapolation in the graph I posted, but as @Elroch already pointed out, that would be the least of my problems in that respect.

 

@MARattigan
Yes.  You're right.  With KPK lined up on the d-file - the pawn could have taken for check and then Kd8.
So either player could be to move - so the x2 stands.

I'll delete my invalid posts therefore.
So what that means is that the messiness of how total positions are arrived at in the KKP tables doesn't come from stalemates -
it comes from checks that couldn't have happened happy.png
Like you said.

Now - if we simply trust the tablebases and don't question about illegals versus ratios of Wins to drawn ...
then that leaves the irksomeness of the fact that there seems to be no trend as to ratio of wins versus odd numbers of pieces on the board.
To make a pun  
Its Odd that !

And does the player who is to move have an edge in the stats?
I don't know if that would help much - if it did/does. 
I would think that would even out ..

haiaku
Elroch wrote:
haiaku wrote:
Elroch wrote:
haiaku wrote:

Yes, but Q-learning needs a table to store the Q(state, action) values. For chess, though, we cannot provide this table beforehand

Nor in any discrete Q-learning problem. It is the action values that get learnt.

Yeah, I meant that for chess we don't even know what would be the size of that (huge) matrix, therefore it would be inefficient to use it in a program.

Well, this is what some of the DeepMind work has done, and it is close to state of the art.

A way to think of it is that the network derives a way to positionally evaluate a general position with high level abstraction using millions of parameters.

Ok, but what I meant is that the NN is not a simple two-dimensional array (something up to 10³⁷ × average number of legal moves) that stores the Q(state, action) values, like in simple Q-learning, otherwise there would be no difference. The NN is much smaller and efficient, but it is nonetheless an approximator, so they modified the temporal difference equation to solve the problem of instability.

tygxc

#2581

"Again what does your calculation give for these games?"
++ Nothing at all for your artificially constructed games: I try to extract information from various sources: ICCF WC tournaments, AlphaZero autoplay, TCEC games, human grandmaster games, 7-men endgame table base...
The calculation in case applies to large, strong tournaments and derives the error rate E from the rate D of decisive games.

"Why do you apply it games where you can't possibly know where the errors are?"
++ I apply to real games played by ICCF grandmasters at 5 days/move with engine assistance. That is 136 games * 39 moves/game * 2 players * 5 days/move/player = 53040 days = 145 years of engine time already spent.

"I've listed the errors in these games  by checking with the Syzygy tablebase."
++ Your examples are not representative. KRPP vs. KRP is representative. Your desktop running for a short time is no match for a 10^9 nodes/s cloud engine running for 60 h/move.

"The fact that you carry on to "calculate" that SF14 will find errors in its top four moves in only 1 in 100,000,000,000,000,000,000 positions and I've given you four already ought to tell you something." ++ You have calculated neither on a 10^9 nodes/s cloud engine nor for 60 hours/move. Your positions are not representative. Try KRPP vs. KRP and I predict even your desktop with short calculation time will find the table base correct move as its top 1 move.

playerafar


"Try KRPP vs. KRP and I predict even your desktop with short calculation time will find the table base correct move as its top 1 move."
@tygxc said 'table base' there.

That's 'table base'.   Not 'games with top four moves'.

Elroch
haiaku wrote:
Elroch wrote:
haiaku wrote:
Elroch wrote:
haiaku wrote:

Yes, but Q-learning needs a table to store the Q(state, action) values. For chess, though, we cannot provide this table beforehand

Nor in any discrete Q-learning problem. It is the action values that get learnt.

Yeah, I meant that for chess we don't even know what would be the size of that (huge) matrix, therefore it would be inefficient to use it in a program.

Well, this is what some of the DeepMind work has done, and it is close to state of the art.

A way to think of it is that the network derives a way to positionally evaluate a general position with high level abstraction using millions of parameters.

Ok, but what I meant is that the NN is not a simple two-dimensional array (something up to 10³⁷ × average number of legal moves) that stores the Q(state, action) values, like in simple Q-learning, otherwise there would be no difference. The NN is much smaller and efficient, but it is nonetheless an approximator, so they modified the temporal difference equation to solve the problem of instability.

Deep reinforcement learning always approximates and the question of stability is a general practical one, more empirically than theoretically based. Using a discount factor is virtually universal for general reasons.

MARattigan
tygxc wrote:

#2581

"Again what does your calculation give for these games?"
++ Nothing at all for your artificially constructed games:

Nothing at all, because you decline to perform the calculation.

I'll do it for you, following your recipe here. (Not implying I endorse your recipe.)

Let us do some simple high school math
Let D represent the rate of decisive games
Let E represent the error rate per game
D = E + E³ + E^5 + E^7 + ... = E / (1 - E²)   
Hence
E² + E/D - 1 = 0
Hence
E = √(1 + 1 / (2*D)²) - 1 / (2*D)
Let us now apply this to these games

D=1/10=0.1, E=.099 E²=.0098, E³=.0009

So the data show, that 99% of the draws in these games are ideal games with optimal moves that thus are part of the weak solution of chess.

Except exactly 0% of both the wins and the draws are ideal games if you check with Syzygy. These are SF14 v SF14.

I try to extract information from various sources: ICCF WC tournaments, AlphaZero autoplay, TCEC games, human grandmaster games, 7-men endgame table base...
The calculation in case applies to large, strong tournaments and derives the error rate E from the rate D of decisive games.

In short, you base your predictions about what SF14 will do on anything but what SF14 does.

"Why do you apply it games where you can't possibly know where the errors are?"
++ I apply to real games played by ICCF grandmasters at 5 days/move with engine assistance. That is 136 games * 39 moves/game * 2 players * 5 days/move/player = 53040 days = 145 years of engine time already spent.

Yes, I already know you apply it to games where you can't possibly know where the errors are.

The question was - why?

"I've listed the errors in these games  by checking with the Syzygy tablebase."
++ Your examples are not representative. KRPP vs. KRP is representative. Your desktop running for a short time is no match for a 10^9 nodes/s cloud engine running for 60 h/move.

As I already pointed out, the error rate per move (ply) at 1 sec/move in the above games was 0.9%.

The error rate at 2048 sec/move was 4.8%.

You have no reason at all to believe that my desktop running at 1 sec/move wouldn't beat the pants off your 10^9 nodes/s cloud engine running for 60 h/move.

"The fact that you carry on to "calculate" that SF14 will find errors in its top four moves in only 1 in 100,000,000,000,000,000,000 positions and I've given you four already ought to tell you something." ++ You have calculated neither on a 10^9 nodes/s cloud engine nor for 60 hours/move. Your positions are not representative. Try KRPP vs. KRP and I predict even your desktop with short calculation time will find the table base correct move as its top 1 move.

It won't in the KRPP vs. KRP position I already posted for you. What's the fixation with KRPPvKRP?

But as you have been told, if you're to do a takeback you have to try all alternatives anyway (or show they've already been covered) to prove anything. The crap about top four moves is irrelevant. Why are you still talking about it?

And my posts here and here (which, of course, you ignored) show pretty incontrovertibly that at 60 hours a move a lower bound (an absurdly low lower bound) on the time your process would take to find a solution (assuming it were fixed to actually find a solution) would be 4.5 x 10¹⁰ years.

So why are you still even talking about 60 hours a move?

In fact why are you still talking about any of it?

You could bring that lower bound down to a little over 200 years if you use instead 1 millisecond a move, but the lower bound is nowhere near the actual number, so you'd still be looking at the heat death of the universe.

And anyway, who's going to fund you three supercomputers for even 200 years, never mind the seven maids with seven mops?

playerafar

"so you'd still be looking at the heat death of the universe."
happy.png
Uh oh !   
Could universal entropy occur if there are other big bangs elsewhere ?
what if there are future big bangs that would upset such entropy?
Could our 'big bang' be so remote from the others - that it would still suffer an Entropy end ?
Maybe.
'Entropy' applies to 'closed systems'. 
So maybe yes and it would amount to 'closed'.

playerafar

But again - @tygxc appears to have slipped up - again -
when he said 'table bases' in reference to his 'four moves'.  
happy.png

playerafar


"so you'd still be looking at the heat death of the universe."
Could a universe that is infinite in size and mass and age and in numbers of future big bangs wherever - with more 'cosmic eggs' - 
ever undergo 'heat death'?
'It'.   If its 'multiple universes' and 'them' that contradicts 'universe'.
Anyway - regarding localities it would depend if 'the others' could impact 'this one'.  
Maybe some scientists try to argue that if 'the others' are so distant that they cannot affect 'this one' - then 'it' (the local big bang) 'may as well' be the only  one.
That's not very scientific though.
That's like the claim that if nobody is around to hear a tree fall then it makes no sound.  That's unscientific.  Invalid.  Mildly provocative.

haiaku
Elroch wrote:

Deep reinforcement learning always approximates and the question of stability is a general practical one, more empirically than theoretically based. Using a discount factor is virtually universal for general reasons.

Mmmhh...

  "Two years ago we introduced the first widely successful algorithm for deep reinforcement learning. The key idea was to use deep neural networks to represent the Q-network, and to train this Q-network to predict total reward. Previous attempts to combine RL with neural networks had largely failed due to unstable learning. To address these instabilities, our Deep Q-Networks (DQN) algorithm stores all of the agent's experiences and then randomly samples
and replays these experiences to provide diverse and decorrelated training data." ¹

In fact, in that original work for Atari 2600 games, they used the Bellman equation, as it can be seen in the formulae (not included here for the sake of brevity):

  "The network is trained with a variant of the Q-learning algorithm, with stochastic gradient descent to update the weights. To alleviate the problems of correlated data and non-stationary distributions, we use an experience replay mechanism which randomly samples previous transitions, and thereby smooths the training distribution over many past behaviors. [ . . . ]
  The basic idea behind many reinforcement learning algorithms is to estimate the action-value function, by using the Bellman equation as an iterative update [ . . . ]. Such value iteration algorithms converge to the optimal action-value function [ . . . ]. In practice, this basic approach is totally impractical, because the action-value function is estimated separately for each sequence, without any generalisation. Instead, it is common to use a function approximator to estimate the action-value function [ . . . ]. In the reinforcement learning community [ . . . ] sometimes a non-linear function approximator is used [ . . . ], such a neural network. We refer to a neural network function approximator with weights 𝜽 as a Q-network. A Q-network can be trained by minimising a sequence of loss functions Lᵢ(𝜽) that changes at each iteration i. [ . . . ] In contrast [ . . . ] we utilize a technique known as experience replay where we store the agent's experiences. During the inner loop of the algorithm, we apply Q-learning updates, or minibatch updates, to sample of experience [ . . . ]" ²

The method follows closely the Leemon's approach³, however in AlphaGo Zero and AlphaZero they did not use the Bellman equation, they minimize cross-entropy instead (because they use MC, not deep Q-learning):

   "Each edge (s, a) in the search tree stores a prior probability P(s, a), a visit count N(s, a), and an action-value Q(s, a). Each simulation starts from the root state and iteratively select moves that maximize an upper confidence bound Q(s, a)+U(s, a), where U(s, a)P(s, a)/(1+N(s, a)), until a leaf node s' is encountered. [ . . . ] Specifically, the parameters 𝜽 are adjusted by gradient descent on a loss function l that sums over mean-squared error and cross-entropy losses, respectively, [ . . . ]       l = (zv)² − 𝝅 log p + c||𝜽||² , where c is a parameter [ . . . ]"⁴

That upper bound U is what makes the Lc0 developers call the search PUCT rather than MCTS.

AFAIK, deep Q-learning can be more efficient when rewards are immediate, but in chess this is not the case. Anyway, I doubt it would be much closer to the true value of a node eventually, and if it was, it would be slower to train... There is no free lunch.

 

¹ https://www.deepmind.com/blog/deep-reinforcement-learning

² https://arxiv.org/pdf/1312.5602.pdf

³ www.leemon.com/papers/1995b.pdf

https://www.researchgate.net/publication/320473480_Mastering_the_game_of_Go_without_human_knowledge

tygxc

#2591

"I'll do it for you, following your recipe here."
++ My recipe is about games in a sufficiently large tournament of e.g. 136 or 210 games.
A game is a sequence of legal moves that starts from the initial position.
You cannot conclude anything from 10 sequences starting from an artificially constructed starting position.

"In short, you base your predictions about what SF14 will do on anything but what SF14 does."
++ No, I base my conclusion on a world championship finals with human ICCF grandmasters assisted by engines and thinking at 5 days/move/player.

"Yes, I already know you apply it to games where you can't possibly know where the errors are."
++ I do not need to know where the errors in the decisive games are, I only need to know that 99% of the draws contain no errors and thus are ideal games with optimal moves and thus are part of the weak solution of chess. We already have hundreds of such ideal games with optimal moves in ICCF world championship finals and they represent hundreds of years of calculation on multicore engines.

"As I already pointed out, the error rate per move (ply) at 1 sec/move in the above games was 0.9%. The error rate at 2048 sec/move was 4.8%."
++ That is an anomaly that results from the fact that you start with an artificially constructed position instead of the initial position.

"You have no reason at all to believe that my desktop running at 1 sec/move wouldn't beat the pants off your 10^9 nodes/s cloud engine running for 60 h/move."
++ Yes, I have all reason to believe that a 10^9 nodes/s cloud engine beats the pants off your desktop at 1 s/move. Your suggestion is ridiculous. You could just as well claim that you would beat the pants of me if you play at 1 s/move bullet speed and I play 5 days/move correspondence.

"It won't in the KRPP vs. KRP position I already posted for you."
++ In that position in its real form with the 50-moves counter and the 3-fold repetition counter reset, i.e. right after a pawn move or a capture, my desktop finds the table base exact move as its top 1 move.

"What's the fixation with KRPPvKRP?"
++ KRPP vs. KRP is the most representative as it counts 7 men and as it is the endgame that occurs most in real play.

"But as you have been told, if you're to do a takeback you have to try all alternatives anyway (or show they've already been covered) to prove anything."
++ Trying 4 alternatives for white is enough proof: it leaves 1 error in 10^20 positions and there are not that many legal, sensible, reachable, and relevant positions.

"Why are you still talking about it?"
++ Because it is relevant. Weakly solved means that for the initial position a strategy has been determined to achieve the game-theoretic value against any opposition. Yes, any opposition does include the bad moves too, but if the game-theoretic value i.e. a draw can be achieved against the top 4 white moves, then it can be achieved against the worse moves too. If black can achieve the draw against 1 e4, 1 d4, 1 c4, and 1 Nf3, then there is no doubt that black can achieve the draw against 1 a4 as well.

"And my posts here and here show pretty incontrovertibly that at 60 hours a move a lower bound on the time your process would take to find a solution would be 4.5 x 10¹⁰ years."
++ That is not "incontrovertibly", that is ridiculous.

"why are you still even talking about 60 hours a move?"
++ Maybe on the 10^9 nodes/s cloud engine it can be shorter than 60 h/move. The 60 h/move was derived from the AlphaZero autoplay paper. First in that paper they probably used a slower hardware. Also AlphaZero is known to use a more complicated evaluation function than e.g. Stockfish, thus reaching less nodes/s. So for the 10^9 nodes/s cloud engine the time per move can probably be shorter than 60 h/move.

"In fact why are you still talking about any of it?"
++ Because it is relevant.

"who's going to fund you three supercomputers"
++ That is the key question. It cannot be done without modern computers. The most feasible approach is probably to start with one ECO code e.g. C67 and one 26-men tabiya of that. Once that is done it will be clear how many legal, sensible, reachable, and relevant positions had to be searched to weakly solve that 1 tabiya of that 1 ECO code.

tygxc

#2593
@tygxc appears to have slipped up when he said 'table bases' in reference to his 'four moves'."
++ No, that is what I said all the time:
"Give me five years, good assistants and modern computers, and I will trace all variations from the opening towards tablebases and 'close' chess." - GM Sveshnikov
From the opening towards tablebases.
The 4 white moves are the route from the opening i.e. 26-men tabiya towards the the 7-men endgame table base.
The 7-men endgame table base gives the exact evaluation draw/win/loss at the end of the calculation.

playerafar

 

@tygxc (again) ...  now maybe you have an explanation ...
I'm not saying you don't  - nor that you do.
"The fact that you carry on to "calculate" that SF14 will find errors in its top four moves in only 1 in 100,000,000,000,000,000,000 positions and I've given you four already ought to tell you something." ++ You have calculated neither on a 10^9 nodes/s cloud engine nor for 60 hours/move. Your positions are not representative. Try KRPP vs. KRP and I predict even your desktop with short calculation time will find the table base correct move as its top 1 move."

Again - 'table base' versus 'top four moves'.

playerafar

From @haiaku 's post - hey I like this term (yes - has nothing to do with 'universal entropy' mentioned earlier)
but here's the passage:
"they minimize cross-entropy instead (because they use MC, not deep Q-learning):"
'cross-entropy' ...  neat.  
I'm not claiming I understand it.  It stood out though.

tygxc

#2599
"Again - 'table base' versus 'top four moves'."
++ It is not versus, it is top four moves until the table base is reached.

haiaku
playerafar wrote:

From @haiaku 's post - hey I like this term 'cross-entropy' (yes - has nothing to do with 'universal entropy' mentioned earlier).

Yes, it's about the entropy in information, which formulation is basically the same of Gibbs entropy. Cross-entropy can be used as a measure of the difference between two probability distributions.

haiaku
tygxc wrote:

Maybe on the 10^9 nodes/s cloud engine it can be shorter than 60 h/move. The 60 h/move was derived from the AlphaZero autoplay paper. First in that paper they probably used a slower hardware. Also AlphaZero is known to use a more complicated evaluation function than e.g. Stockfish, thus reaching less nodes/s. So for the 10^9 nodes/s cloud engine the time per move can probably be shorter than 60 h/move.

No. At 60000 nodes/s and 1/10 of the time SF had at its disposal (it ran at 6 × 10⁷ nodes/s) , A0 was close to its opponent, so for that SF to reach the same result in 60 hours it should run at 10 × 6 × 10⁷ nodes/s. Even if current SF with newer hardware was 1000 times faster, the number of nodes in the search tree after 5 years would be too far from 10¹⁷. Check your own previous calculations.

playerafar
haiaku wrote:
playerafar wrote:

From @haiaku 's post - hey I like this term 'cross-entropy' (yes - has nothing to do with 'universal entropy' mentioned earlier).

Yes, it's about the entropy in information, which formulation is basically the same of Gibbs entropy. Cross-entropy can be used as a measure of the difference between two probability distributions.

Nice.  I admit I haven't seen that one before.
'Probably' worth some time by me to find out more about that on the net.

playerafar
tygxc wrote:

#2599
"Again - 'table base' versus 'top four moves'."
++ It is not versus, it is top four moves until the table base is reached.

The way it was worded did not suggest that -

Again:
"The fact that you carry on to "calculate" that SF14 will find errors in its top four moves in only 1 in 100,000,000,000,000,000,000 positions and I've given you four already ought to tell you something." ++ You have calculated neither on a 10^9 nodes/s cloud engine nor for 60 hours/move. Your positions are not representative. Try KRPP vs. KRP and I predict even your desktop with short calculation time will find the table base correct move as its top 1 move."

I suggest you carefully qualify the use of the ++ there.

A claim of  one in a billion trillion error rate of your four moves - followed closely by a mention of a desktop and 'table base'.
Read it.

Suggestion 2:
Word the 'explanation' so that almost anyone just entering the conversation for the first time - would find it quite crystal clear.
Or qualify which is your text and which isn't ...   happy.png
Idea:  may as well do it right.

tygxc

#2604
"LCZero considers around 1000x fewer nodes per second than Stockfish, but it applies a much stronger evaluation method. In other words, it prioritizes quality over quantity. And despite the difference in the algorithms of Stockfish and Lc0, they will show almost identical results in a given position. Note how Lc0 has reached 2x less depth and 1000x less NPS."
https://chessify.me/blog/nps-what-are-the-nodes-per-second-in-chess-engine-analysis 

I could not find the nodes/s used in the AlphaZero autoplay paper.
https://arxiv.org/pdf/2009.04374.pdf

Which I used figure 2 to derive 60 h/move for 1 error in 10^20 positions.

So presumably 60 h/move is too much: the same 1 error in 10^20 positions for the 4 top white moves can be reached with less time on a 10^9 nodes/s cloud engine. I have to think how much smaller it could be.

"Stockfish operates with the so-called "thin" nodes (little evaluation for a much bigger number of nodes), while Leela Chess Zero operates with "thick" nodes (better evaluation for a smaller number of nodes)." - from above reference.
To weakly solve chess I would prefer thin nodes of Stockfish. The aim is to calculate as deeply as possible so as to hit the 7-men endgame table base.

I also should revise the 60 h/move in a way that the number of nodes in the solution tree is about the same as the number of positions searched for each node, like it turned out in the solution of checkers, i.e. for each node of the solution tree about a tree of the same size is pruned away.

The main question remains the number of legal, sensible, reachable, and relevant positions.
That is the number of nanoseconds on the 10^9 nodes/s cloud engine, by whatever method.
legal 10^44, sensible 10^38 - 10^32, reachable more than the square root of sensible, relevant about 10% of reachable.

The most practical way would be to solve 1 tabiya of 1 ECO code e.g. C67.
That would give exactly how many positions had to be searched.