Chess will never be solved, here's why

Sort:
tygxc

#2079
I used the definition by Prof. van den Herik and I gave a reference to his paper.
Personally I prefer a simpler, more practical definition but then I get reproached of using my own definitions instead of the generally accepted ones, which is what you do here.
You even find fault with his generally accepted definition.
I am glad I am not the only one being critiqued.
You should take your criticism to prof. van den Herik this time.

tygxc

#2077

"How did they work out the error rate?"
++ They did not: they gave the draw rate at 1 s/move and at 1 min/move. Games that do not draw are sure to contain 1 error. Further I assumed 40 moves hence 80 ply per game. Then I extrapolated from 1 s/move and 1 min/move to 1 h/move and to 60 h/move.

"were the games played under basic rules with the addition of a double repetition rule"
++ That does not matter at all. The result would be exactly the same.

MARattigan
tygxc wrote:

#2079
I used the definition by Prof. van den Herik and I gave a reference to his paper.
Personally I prefer a simpler, more practical definition but then I get reproached of using my own definitions instead of the generally accepted ones, which is what you do here.
You even find fault with his generally accepted definition.
I am glad I am not the only one being critiqued.
You should take your criticism to prof. van den Herik this time.

With respect to Prof. van den Herik, the definition is flawed.

I think I showed that in #2072 - can you find any fault with it?

You'd do better to use the definition I gave in the last para. of #2079. I think everyone could agree to that one.

Elroch
tygxc wrote:

#2077

"How did they work out the error rate?"
++ They did not: they gave the draw rate at 1 s/move and at 1 min/move. Games that do not draw are sure to contain 1 error. Further I assumed 40 moves hence 80 ply per game. Then I extrapolated from 1 s/move and 1 min/move to 1 h/move and to 60 h/move in an entirely unreliable and almost certainly wrong way.

[snip]

 

MARattigan
tygxc wrote:

#2077

"How did they work out the error rate?"
++ They did not: they gave the draw rate at 1 s/move and at 1 min/move. Games that do not draw are sure to contain 1 error. Further I assumed 40 moves hence 80 ply per game. Then I extrapolated from 1 s/move and 1 min/move to 1 h/move and to 60 h/move.

Games that draw are sure to contain any number of errors that sum to -1/2 point for White if the starting position is a win for White, any number of errors that sum to 0 points if the starting position is a draw or any number of errors that sum to +1/2 point for White if the starting position is a win for Black.

The only thing it tells you about the number of errors is that it's at most the number of ply that were played in the game. 

"were the games played under basic rules with the addition of a double repetition rule"
++ That does not matter at all. The result would be exactly the same.

If you look at my SF14 v SF14 examples here , they contain a total of 30 errors in the basic rules game and 27 errors in the competition rules game. Only 4 of those are errors in both games.

That strongly suggests it does matter at all.

 

tygxc

#2084
I work on the generally accepted hypothesis that chess is a draw. Hence each decisive game must contain 1 error. As the error rate is that low, the occurence of 2 or more errors is even rarer and can be neglected.
Your examples KNN vs. KP only highlight the failure of the evaluation function. KNN vs. KP is +5. KNN vs. K is +6, but is a draw. The evaluation function prefers a +6 draw over a +5 win.

MARattigan
tygxc wrote:

#2084
I work on the generally accepted hypothesis that chess is a draw. Hence each decisive game must contain 1 error. As the error rate is that low, the occurence of 2 or more errors is even rarer and can be neglected.
Your examples KNN vs. KP only highlight the failure of the evaluation function. KNN vs. KP is +5. KNN vs. K is +6, but is a draw. The evaluation function prefers a +6 draw over a +5 win.

As the error rate is what low?

You think SF14 can't handle a couple of knight's and a pawn, but when it gets to 4 rooks, 4 knights, 4 bishops, 2 queens and 16 pawns that should be ok then?

(And when SF14 converted from  KNN vs. KP to KNN vs. K in most cases it wasn't a +5 win in the competition rules game for which SF14 is designed. It was alread a draw under competition rules, so taking the pawn to ensure no Black win and give the remote possibility of a KNNK vs K win for White was the most sensible option.)

tygxc

#2086

"As the error rate is what low?"
++ At 1 s/move: 88.2% draw = 11.8% error / game = 1 error / 679 positions
At 1 min/move: 97.7% draw = 2.3% error / game = 1 error / 3478 positions
Extrapolating: at 60 h/move: 1 error / 10^5 positions

"You think SF14 can't handle a couple of knight's and a pawn, but when it gets to 4 rooks, 4 knights, 4 bishops, 2 queens and 16 pawns that should be ok then?"
++ It is not Stockfish, it is its evaluation function that is flawed.
If Stockfish just can calculate to the 7-men endgame table base then its result is exact.
KNN vs. KP highlights the failure of the evaluation function, not of Stockfish.
We know that the evaluation function is unsuitable for KNN vs. KP.
Stockfish itself is overwhelmed by positions with more than 26 men, when chess is most difficult. That is why the good assistants should prepare 26-men tabiya as starting points.
If you want to verify that the table base exact move is within the top 4 Stockfish candidate moves, then KRPP vs. KRP may be better.

playerafar

Such conversations can often degenerate into 
"but but this website evidence for which I have provided the links and excerpts is peer-reviewed !!"
Then it might become one person's linked website against another's - both with the so-called 'peer review'. 
Kind of like the articles then have sunshine coming out of them.  
And during and subsequently  - the party who's positions are much the more dubious refuses to budge one millimeter  !! happy.png

KAMPA1986

 

tygxc

#2088
When a scientist submits a scientific paper to a scientific journal, then the editor forwards the manuscript anonymously to 3 other scientists working in the same field. Those 3 peers then review the manuscript and advice: to accept the paper for publication, to reject the paper, or to accept after some changes. The editor then decides to publish or not based on these 3 reviews. That is why peer-reviewed papers have more credibility.

playerafar

@tygxc

I deleted my previous post.  It was too one-sided.
I'll add a suggestion or two.
Don't budge one millimeter on anything you think you're right about.
Stick to your guns.  You will anyway !  happy.png
Nobody can compell you to accept nor give in on anything you don't want to or to agree with anything you think is wrong.
And good thing !  
It would be horrible if there was such a thing as 'thought police' in this world !   

Another way to say this: 
You're doing a good job of not getting pushed around.
Its admirable!  Really.   A flip side to my previous post.
A lot of things are like that.
Two-sided.  Or multi-sided.

mpaetz
Optimissed wrote:
tygxc wrote:

#2088
When a scientist submits a scientific paper to a scientific journal, then the editor forwards the manuscript anonymously to 3 other scientists working in the same field. Those 3 peers then review the manuscript and advice: to accept the paper for publication, to reject the paper, or to accept after some changes. The editor then decides to publish or not based on these 3 reviews. That is why peer-reviewed papers have more credibility.

But not complete credibility because three scientists, working in a related field, wouldn't be working in the same field because if they were, the new work wouldn't be genuine research. So they won't be experts in the exact area of research and therefore cannot be expected to pick up on all errors. All they might do is check for mathematical, logical or procedural errors.

     But certainly more credibility than a single editor who may know little or nothing of the specific topic can provide. More to the point, the publication of the paper is only the first step in the process of the thesis' acceptance. Once anyone who is interested can see the author's work, evaluations and criticisms will follow.

playerafar

What's going to happen when one peer-reviewed article collides with another peer-reviewed article ? 
Maybe the general subject of such collissions can be googled.

playerafar

At first attempt - I didn't find much - 
but I found this  - which suggests its 'good' when scientists disagree -

https://www.climatedepot.com/2021/09/10/scientists-fight-back-against-facebooks-alleged-independent-fact-checkers-on-climate-climate-feedback-is-effectively-spreading-the-very-misinformation-that-you-purport-to-be-trying/

playerafar

This next one might be a better example.
Still not ideal - but maybe a lot 'closer'.  
Dr. Joseph Mercola - perhaps the biggest quack in all of human history -
even he - has been supported by 'peer review' ...

http://marktaliano.net/peer-reviewed-manuscript-concludes-that-cdc-massively-inflates-covid-19-case-and-death-numbers-with-creative-statistics/

How much does one have to secretly pay 'peer reviewers' to get them to favorably 'peer-review' you ?
And as I read further about peer reviews - there's even the so-called 'anonymous' 'peer reviews'.  happy.png

DiogenesDue

As Copernicus would attest, disagreement in the scientific community is always needed wink.png.

MARattigan

And as Gallileo's toenails would attest, the same is true of the community at large.

MARattigan
tygxc wrote:

#2086

"As the error rate is what low?"
++ At 1 s/move: 88.2% draw = 11.8% error / game = 1 error / 679 positions
At 1 min/move: 97.7% draw = 2.3% error / game = 1 error / 3478 positions
Extrapolating: at 60 h/move: 1 error / 10^5 positions

On yer bike. Read @Elroch's post here. He puts it mildly.

And you've no idea how many errors were made.

What you really mean is:

At 1 s/move: 88.2% draw = unknown errors / game = unknown errors / 679 positions
At 1 min/move: 97.7% draw = unknown errors / game = unknown errors / 3478 positions
Extrapolating: at 60 h/move: 1 error / however many I want positions

"You think SF14 can't handle a couple of knight's and a pawn, but when it gets to 4 rooks, 4 knights, 4 bishops, 2 queens and 16 pawns that should be ok then?"
++ It is not Stockfish, it is its evaluation function that is flawed.

Stockfish is it's evaluation function.

If you took that away it would probably play worse than White Knight on an Acorn once they were out of their opening books.
If Stockfish just can calculate to the 7-men endgame table base then its result is exact.

But it can't calculate to the 7-man endgame tablebase any more than I can. I did offer you a deal on that, but you didn't take me up on it.
KNN vs. KP highlights the failure of the evaluation function, not of Stockfish.

See what I already said.
We know that the evaluation function is unsuitable for KNN vs. KP.

It was a White to win position in KNN vs. KP.

Troitzky did a very full and accurate analysis of these before the end of the second world war for the basic rules game and his work can now be checked against the Nalimov tables.

A full weak solution is also available for the competition rules game on the sysygy-tables.info site, so Stockfish's evaluations should be far more accurate than you will encounter in the vast majority of positions.

The evaluation function is specially tailored for individual endgames.
Stockfish itself is overwhelmed by positions with more than 26 men, when chess is most difficult. That is why the good assistants should prepare 26-men tabiya as starting points.

Where are you going to find assistants that are less overwhelmed than Stockfish with more than 26 men. Would you not need such to make the tabiya more reliable than Stockfish from the start. You still have to prove the tabyia evaluations correct and do the non tabiya in any case, so what's the point of that step?
If you want to verify that the table base exact move is within the top 4 Stockfish candidate moves, then KRPP vs. KRP may be better.

I'll look. Have you tried it yourself?

But there isn't a lot of point. The position I posted where Stockfish got all four wrong was the first blunder in a SF14 v SF4 KQ vs. KNN match and came after about a dozen moves, so you could be out with the frequency it might occur by a factor of around 10^4 if you're lucky.

Even if you weren't you'd still not finish up with a proof of anything.

 

MARattigan

@tyxgc

I tried your KRPP vs. KRP in a deepish mate.

On move 46 it blundered out of a drawn position that it had blundered into back into a win under your 50 move rule free new game (but not under its own competition rules game).

I tried kibbitzing the position at that point and the Sysygy recommended Rh8+ (the only move to win under your new rules) came in at 4th. place. It blundered back into a draw on the next ply so the next few positions before Arena chopped the game were not interesting.

So instead I tried adding 9 to the ply count of the position at move 46 and kibbitzing that. Result below.

In fact if the ply count reaches 100, SF14 will always evaluate it at 0.00, so the top 4 choices will always be random. 

That means you have your demented monkey at that point without needing to use 60 hours on your supercomputer to provoke minimax pathology.

I think you might find the same with all strong engines.

Edit: Just realised this is wrong. It's only the leaf evaluations that give 0.00. If it finds either a mate or move that resets the ply count the evaluation will be nonzero. But I think it will still produce a high blunder rate in positions where there is no quick mate.