Chess will never be solved, here's why

Sort:
MARattigan
btickler wrote:
MARattigan wrote:

@tygxc

#1858 "Tickler's post is wrong on other counts as adhering to 10^44...found legal by Tromp..."

I have to concede @btickler didn't get it quite right in context.

Tromp's number is currently the best one.  Any number that discards "excess" (as determined by human players' notions of chess) promotions, or arbitrates "reasonable positions" based on human-derived valuations of chess is garbage.  If anyone can prove illegality of more positions, go for it.  Until then, 10^44.5 is the number.  Before Tromp's study, it was 10^46.7.

Sorry - disagree. Tromp's numbers are for the basic rules game. I think @tygxc wants the competition rules game.

See #1869.

But I agree that discarding positions on the grounds of "reasonableness" in either case is just feeble minded.

DiogenesDue

The checkmate in 545 moves is a good illustration of how far away humans and engines are from proving anything.  After 30 minutes on this already simplified 7 man position, Stockfish 14 has only reached a depth of 38 ply and evaluates this forced mate for black as a 0.27 centipawn advantage for white.

And you want to use Stockfish to bridge from 32 pieces to the 7 man tablebase?

MARattigan
btickler wrote:

The checkmate in 545 moves is a good illustration of how far away humans and engines are from proving anything.  After 30 minutes on this already simplified 7 man position, Stockfish 14 has only reached a depth of 38 ply and evaluates this forced mate for black as a 0.27 centipawn advantage for white.

And you want to use Stockfish to bridge from 32 pieces to the 7 man tablebase?

If SF14 plays it against itself, then by the time it reaches 38ply I would guess the probability of it being in a theoretically White winning position, a theoretically Black winning position or a draw would be around 1/4, 1/4 and 1/2 respectively, but the actual outcome in any case would be a draw with high probability.

Elroch

Why would you believe that? Seems highly likely to be a draw to me. More so than another engine with slightly differing biases.

DiogenesDue
MARattigan wrote:

Sorry - disagree. Tromp's numbers are for the basic rules game. I think @tygxc wants the competition rules game.

See #1869.

But I agree that discarding positions on the grounds of "reasonableness" in either case is just feeble minded.

There's little point to solving competition rules chess, a game that only a few millions play actively, while basic chess is played by orders of magnitude more people.

Insofar as it is possible, the solutions being attempted need to address basic chess as it has existed since the adoptions of the mad queen, en passant, etc. and if they can also piggyback competition rules, 960, etc. then great, pile them on.  But the reverse makes no sense.  FIDE changed the rules only a few years ago (2014, 75 move rule), and it could change the rules in a way that affects the attempted solutions again at any time.  Creating any large scale proof for something with fleeting rulesets like that is foolhardy...and while this discussion is moot anyway because chess cannot be solved using current technology, if we are allowing for hypotheses to be put forth I feel that they should be at least somewhat anchored in something solid...

Elroch

Competition rules chess is a less interesting game to a game theorist I feel. The 50 move rule is a very ugly practicality.

MARattigan
tygxc wrote(here):

#27

...

"Are you then saying you have solved chess and proved AlphaZero's evaluation function doesn't suffer from minimax pathology?"
++ No: chess has not yet been solved and no: AlphaZero evaluation does suffer as otherwise it could not win a game against itself. ...

You obviously didn't read the link I gave. 

Minimax pathology is a term applied to the phenomenon of incorrect evaluations increasing as the search depth increases. AZ could win a game against itself whether or not it's evaluations resulted in minimax pathology (either because that is the correct result or it's leaf evaluation function is flawed). If the combination of inaccurate relative leaf evaluations and the game tree results in minimax pathology the result would be that it blunders full or half points more frequently if it searches to a greater depth, that is if you give it more time or a faster processor or both.

I did point that out here but you ignored it.

The outcome at any rate could be that if you give SF14 6 hours a move on a supercomputer you could turn it into the chess playing equivalent of a demented monkey desperately hitting 10^9 keys a second in a futile effort to reproduce the complete works of Shakespeare before its lease runs out.  

Here  is a link to the original paper by D.S.Nau. The paper I originally referred to was one of numerous investigating the problem.

I don't have access to a supercomputer, but I thought I'd see if I could demonstrate the problem on my desktop.

Because SF is designed for the competition rules game I expected that the leaf evaluations in the basic rules game might be inaccurate enough to show some evidence on my desktop. Some authors on the topic have suggested that the problem is reduced or eliminated in game trees where sibling nodes have close real evaluations, so I chose a White to win KNNKP position  which I know has many zugzwang positions in the basic rules game where that wouldn't be true. (It's a mate in 85 in the basic rules game, ply count 0 and mate in 108 in the competition rules game.)

I ran it in Arena SF14 v SF14, no tablebase access, with a fixed time of each of 1, 2, 4, 8, 16, ... , 2048 seconds per move (last about 35 mins/move). Then I went through the games on the https://syzygy-tables.info/ site and marked the blunders both under basic rules and under competition rules.

Key to the comments: "W" means a win in both games, "CRD" means a win in the basic game but a draw in the competition game and "D" means a draw in both games. "BRblunder" indicates a blunder in the basic game but not in the competition game, "CRblunder" vice versa and "blunder" a blunder in both games.

1 second
 
2 seconds

 

4 seconds
 
8 seconds

 

16 seconds

 

32 seconds

 

64 seconds

 

128 seconds

 

256 seconds

 

1024 seconds

 

2048 seconds

 

The results are somewhat inconclusive. It's unlikely that the same blunder figures would result if the exercise were repeated.

I ran the figures through the Wolfram linear regression widget more to get a pretty plot than anything else. (When your plot looks like an eruption of chicken pox, you need to treat the regression analysis with caution.) What it certainly doesn't show, though, is a steady decrease in the blunder rate with increased think time.

Interesting to note that of the 57 blunders in the games, only 4 were blunders under both basic rules and competition rules. (Unfortunately I seem to have lost t=512, which had the most blunders, somewhere between making the graphs and posting, so that is for only 10 of the 11 games.)

Also the phenomenon that when players of equal strength are out of their depth in a mate (as SF14 clearly is here) the result is usually a draw. SF14 drew 10 of the 11 games, the win occurred only in t=1024 secs. (from an already drawn position). 

MARattigan
Elroch wrote:

Why would you believe that? Seems highly likely to be a draw to me. More so than another engine with slightly differing biases.

I was talking about the real evaluation rather than SF's. It's pretty much a random walk at each blunder and draw is in the middle.

But you could be right. I'm still a bit foxed by why the outcomes are usually draws.

Robinabruce

We'll keep playing regardless 😌

playerafar


Rather - I think the discussion is 'in gear' and progress is being made.
I read through the recent posts by @btickler and @MARattigan - (Martin)
and then also looked at some related posts on page 1 and 2 of this forum.
The original post was January 16th - and we will hit 2000 posts - and many of them are good.
I can't put all the 'most salient' posts in one post - some of them are here.

Salient:
"There is no 50 move rule, 75 move rule, 3 fold repetition rule or 5 fold repetition rule in the FIDE basic rules of chess. Those rules now apply only to games covered by competition rules."
Martin posted that way back in mid-January ...
I re-post it now - so that people know what is being referred to by 'competition rules' and 'basic rules'.  
Does referring to the 50 move rule or when there's an absence of it in computer solving projects - actually help or hurt with the issue of 'solvability'?
Its obviously relevant.
But point:  Its not going to help at all with 'uniqueness' of positions.
Whichever way you go on it.
Nor is the three-fold repetition rule nor 5-fold nor even the number of perpetual checks.
"Whaat?  Perpetual check has Got to help !!  It means Draw !!"
In that context yes - perpetual check available - contributes to uniqueness and solving.  Is Critical to it ...
but not the number of times its played and the position being repeated adding to a task that's already more than difficult enough and more than big enough.  
Summary - only perpetual check 'available' contributes to uniqueness and solving there.  The others and their issues don't.  They hurt not help.
And they detract too.

On the other hand - castling and en passant do contribute to uniqueness.
Either one can make a position unique.
Its 'unfortunate' maybe that in the Wikipedia article they use the phrase 'same ambiguity'.  
They shouldn't.
And in theory - neither one should complicate the tablebases much.
If there has to be 'worry' about any endgame tablebase position 'got there' then the whole task gets orders of magnitude more 'suspect'.
Then you're talking 'games' instead of 'positions'.
There's no need to Hurt the Task !  happy.png
Apparently the three tablebases Lomonosov and Szygy and Nalimov all actually did 'handle' en passant and factor it in.
Its not surprising - en passant would only be possible in an extremely low percentage of positions anyway.  
You could almost both multiply and divide by two !!
Case 1 - the double pawn move was made the move before ...
Case 2 - it wasn't.
Result: two unique positions !   That's not going to be a problem at all for the computers.
But !:
New Case 1 - its the move of the side with the fifth rank pawn.
New Case 2 - it isn't.  So no en passant possible.
That has to be reconciled with what the previous move might have been - not with what it was.  Its hopeless if you have to specify previous move possibilities.

playerafar


But the castling 'glitch' is much much more suspect.
Why On Earth would the tablebases have a 'problem' with castling ??

Why On Earth would they 'Taint the Task' by factoring it out of the tablebases  ??

Like with en passant - castling would only be positionally possible in an extremely low percentage of 7-piece and lesser tablebase positions !
So what on earth would they need to factor it out for ??
This is what I came up with:
Research Team Conversation:
Researcher:  "Manager - I've got a way to factor castling into the algorithms for our 7-piece task.  I emailed it to you.  I want to show it to the research team."
Manager:  "I thought we already discussed this.  You know time is at a premium.  The algorithms are already struggling as it is.  I've got financiers to satisfy.
You really think we've got the time to go back through all the code and try to rewrite the whole thing?  Even if we theoretically did - it would still slow down the solving too much.  The computers have plenty of work to do even with the large amount of code we've got in their already !"
Researcher:  "I don't understand why it wasn't factored in at the beginning.  In neither Lomonosov nor Szygy nor Nalimov !!  Why didn't they just include it at start so they don't have to try to untangle the Spaghetti?"
Researcher:  "Castling possibility can make a position unique - how can you possibly legitimately skip it and claim you've 'solved'.  It would be like the feebleminded 'reasonableness of moves'"
Manager:  "Those are all good points - but the answer is No."
Manager:  "I'll let you do this though.  You have my permission to go and bug the professionals on the team about it - for the next ten minutes.   And hopefully they'll ream you out with the explanation that you - a programmer - should have thought of yourself.
But hey - you've got your shot.  I'm being generous.  Now please - close the door on the way out of my office."

playerafar
YouEvenLiftBro wrote:

Chess? Of course it is soluble. By a machine. With enough time.  Even NP problems can be cracked, eventually. This isn't like a 'you need a computer with more atoms than there are in the universe to forecast the universe' kind of conundrum. There are only 64 squares.

My guess is that the perfect solution is that it is drawn. White's first mover advantage is not enough for victory. But I look forward to Stockfish 28 disproving my theory.

Love? That is harder to solve. But the world could do with more of it.

Yes there are only 64 squares but in chess there are up to 13 states for each of those squares.
And then it gets into beyond addition and multiplication -
it gets into Exponentiation.  Gigantic numbers.  And factorials.
And big series's of numbers with processions of gigantic terms.
Its chess not checkers.

playerafar


"Dude, I have a degree in pure maths and mathematical physics with university medals. I get the bigness of the problem."
Far out Man !  Gnarly !

So therefore you have an opportunity to qualify your 
 "of course it is soluble" 
(yes - obviously you meant 'solvable'  Hahahahahah evil.png)

But a lot of the last 1800 posts have been about whether its 'feasible'.
Not whether you could do it with a computer the size of Staten Island and 10 trillion in financing in 100 trillion years.  
One person here has been claiming its 'weakly solvable' in five years - with a few million dollars in financing.
but his claims seem to be more debunked now.  

playerafar

I/we don't have an 'endeavour' about that.   Good Buddy.
Its getting talked about.
You know.  Forum.  Discussion.  That stuff.  

tygxc

#1865

"Losing chess is an even poorer guide to difficulty than checkers because it has a winning strategy. I explained why this can be expected to provide a lower exponent.

Checkers is more directional than chess (it's like it only has pawns!) which is also likely to provide a lower exponent than chess."

++Checkers has two kinds of men, like pawns and overpowered bishops. It is a simpler game than chess: 32 squares, 24 men, 2 types of pieces. Checkers is a draw. Checkers is no chess.
Losing Chess has the same number of squares, number of men, types of pieces. Losing Chess is a win for white. Losing chess has been weakly solved considering 900,000,000 positions.
Thus solving chess requires to consider more than 10^9 positions.

"So don't be confident of needing to evaluate fewer than (10^44)^(2/3) positions, say."
++I am willing to agree with the 2/3, but not with the 10^44.
Even 10^37 is much too high.

Serepheon

i am a maths teacher but i really dont wanna be in this discussion

Serepheon

@tygxc started from the 1st comment and is still there

tygxc

#1870
"Go ahead and dig up some conversion rate for boolean and/or integer operations if you like...I don't have any need of doing more homework."
If you really know something about computers, then you need no homework for that.
In terms of time to execute:

boolean operation < integer addition < integer multiplication < floating point multiplication < floating point division

At its core a computer consists of boolean gates only. Integer addition is done by several boolean operations. Integer multiplication is done by several integer additions. floating point multiplication is done by several integer multiplications, floating point division is done by several floating point multiplications.

Typical problems like forecasting the weather with Navier-Stokes equations require many floating point operations. That is why FLOPS is the right metric for those common problems.

Solving chess requires not a single floating point operation.
It needs only boolean operations and a few operations on small integers.
That is why nodes/s is the right metric to assess the feasibility of solving chess.

10^9 nodes/s equals 1 nanosecond per position to consider
It does not mean that 1 core considers 1 position in 1 nanosecond.
It means that N cores consider N positions in N nanoseconds.

playerafar

Nodes is not the right metric.  It looks worse and worse.
And we're seeing more and more evidence of  that in the postings.
And continued reticence on boolean operations per second.

From @btickler earlier -
"P.S. You gave up the ghost when you later said positions are "considered, not solved", really."
He is mentioning a massive Concession there by somebody .
I don't think @tygxc is going to make an additional concession by admitting the speeds of the computers in booleans/second nor in other units.
He is apparently not going to talk about the speeds of the computers as being intensely insufficient to do the 'weak solving' in even 5 million years let alone 5 years. 
So we'll continue to see the 5 year claim it seems.
Even though he's conceded that 'positions are 'considered, not solved.'  regarding 'nodes'.  Where 'nodes' are very contrived.  

But @btickler and @MARattigan have now apparently done the real forum work here already !
Their posts that occur throughout the 1800 posts so far since January 16th - the opening of the forum - seem to answer the questions of 'here's why'. 
And other questions too. 
There is no obligation to keep debunking already-refuted and partially-conceded claims  happy.png

Serepheon

Very Interesting