Chess will never be solved, here's why

#2088
When a scientist submits a scientific paper to a scientific journal, then the editor forwards the manuscript anonymously to 3 other scientists working in the same field. Those 3 peers then review the manuscript and advice: to accept the paper for publication, to reject the paper, or to accept after some changes. The editor then decides to publish or not based on these 3 reviews. That is why peer-reviewed papers have more credibility.

@tygxc
I deleted my previous post. It was too one-sided.
I'll add a suggestion or two.
Don't budge one millimeter on anything you think you're right about.
Stick to your guns. You will anyway !
Nobody can compell you to accept nor give in on anything you don't want to or to agree with anything you think is wrong.
And good thing !
It would be horrible if there was such a thing as 'thought police' in this world !
Another way to say this:
You're doing a good job of not getting pushed around.
Its admirable! Really. A flip side to my previous post.
A lot of things are like that.
Two-sided. Or multi-sided.

#2088
When a scientist submits a scientific paper to a scientific journal, then the editor forwards the manuscript anonymously to 3 other scientists working in the same field. Those 3 peers then review the manuscript and advice: to accept the paper for publication, to reject the paper, or to accept after some changes. The editor then decides to publish or not based on these 3 reviews. That is why peer-reviewed papers have more credibility.
But not complete credibility because three scientists, working in a related field, wouldn't be working in the same field because if they were, the new work wouldn't be genuine research. So they won't be experts in the exact area of research and therefore cannot be expected to pick up on all errors. All they might do is check for mathematical, logical or procedural errors.

#2088
When a scientist submits a scientific paper to a scientific journal, then the editor forwards the manuscript anonymously to 3 other scientists working in the same field. Those 3 peers then review the manuscript and advice: to accept the paper for publication, to reject the paper, or to accept after some changes. The editor then decides to publish or not based on these 3 reviews. That is why peer-reviewed papers have more credibility.
But not complete credibility because three scientists, working in a related field, wouldn't be working in the same field because if they were, the new work wouldn't be genuine research. So they won't be experts in the exact area of research and therefore cannot be expected to pick up on all errors. All they might do is check for mathematical, logical or procedural errors.
But certainly more credibility than a single editor who may know little or nothing of the specific topic can provide. More to the point, the publication of the paper is only the first step in the process of the thesis' acceptance. Once anyone who is interested can see the author's work, evaluations and criticisms will follow.

What's going to happen when one peer-reviewed article collides with another peer-reviewed article ?
Maybe the general subject of such collissions can be googled.

@mpaetz yes of course although the way people bandy about "peer review", as if it provides ultimate ratification, is a tad irresponsible.

At first attempt - I didn't find much -
but I found this - which suggests its 'good' when scientists disagree -
https://www.climatedepot.com/2021/09/10/scientists-fight-back-against-facebooks-alleged-independent-fact-checkers-on-climate-climate-feedback-is-effectively-spreading-the-very-misinformation-that-you-purport-to-be-trying/

This next one might be a better example.
Still not ideal - but maybe a lot 'closer'.
Dr. Joseph Mercola - perhaps the biggest quack in all of human history -
even he - has been supported by 'peer review' ...
http://marktaliano.net/peer-reviewed-manuscript-concludes-that-cdc-massively-inflates-covid-19-case-and-death-numbers-with-creative-statistics/
How much does one have to secretly pay 'peer reviewers' to get them to favorably 'peer-review' you ?
And as I read further about peer reviews - there's even the so-called 'anonymous' 'peer reviews'.
#2086
"As the error rate is what low?"
++ At 1 s/move: 88.2% draw = 11.8% error / game = 1 error / 679 positions
At 1 min/move: 97.7% draw = 2.3% error / game = 1 error / 3478 positions
Extrapolating: at 60 h/move: 1 error / 10^5 positions
On yer bike. Read @Elroch's post here. He puts it mildly.
And you've no idea how many errors were made.
What you really mean is:
At 1 s/move: 88.2% draw = unknown errors / game = unknown errors / 679 positions
At 1 min/move: 97.7% draw = unknown errors / game = unknown errors / 3478 positions
Extrapolating: at 60 h/move: 1 error / however many I want positions
"You think SF14 can't handle a couple of knight's and a pawn, but when it gets to 4 rooks, 4 knights, 4 bishops, 2 queens and 16 pawns that should be ok then?"
++ It is not Stockfish, it is its evaluation function that is flawed.
Stockfish is it's evaluation function.
If you took that away it would probably play worse than White Knight on an Acorn once they were out of their opening books.
If Stockfish just can calculate to the 7-men endgame table base then its result is exact.
But it can't calculate to the 7-man endgame tablebase any more than I can. I did offer you a deal on that, but you didn't take me up on it.
KNN vs. KP highlights the failure of the evaluation function, not of Stockfish.
See what I already said.
We know that the evaluation function is unsuitable for KNN vs. KP.
It was a White to win position in KNN vs. KP.
Troitzky did a very full and accurate analysis of these before the end of the second world war for the basic rules game and his work can now be checked against the Nalimov tables.
A full weak solution is also available for the competition rules game on the sysygy-tables.info site, so Stockfish's evaluations should be far more accurate than you will encounter in the vast majority of positions.
The evaluation function is specially tailored for individual endgames.
Stockfish itself is overwhelmed by positions with more than 26 men, when chess is most difficult. That is why the good assistants should prepare 26-men tabiya as starting points.
Where are you going to find assistants that are less overwhelmed than Stockfish with more than 26 men. Would you not need such to make the tabiya more reliable than Stockfish from the start. You still have to prove the tabyia evaluations correct and do the non tabiya in any case, so what's the point of that step?
If you want to verify that the table base exact move is within the top 4 Stockfish candidate moves, then KRPP vs. KRP may be better.
I'll look. Have you tried it yourself?
But there isn't a lot of point. The position I posted where Stockfish got all four wrong was the first blunder in a SF14 v SF4 KQ vs. KNN match and came after about a dozen moves, so you could be out with the frequency it might occur by a factor of around 10^4 if you're lucky.
Even if you weren't you'd still not finish up with a proof of anything.
@tyxgc
I tried your KRPP vs. KRP in a deepish mate.
On move 46 it blundered out of a drawn position that it had blundered into back into a win under your 50 move rule free new game (but not under its own competition rules game).
I tried kibbitzing the position at that point and the Sysygy recommended Rh8+ (the only move to win under your new rules) came in at 4th. place. It blundered back into a draw on the next ply so the next few positions before Arena chopped the game were not interesting.
So instead I tried adding 9 to the ply count of the position at move 46 and kibbitzing that. Result below.
In fact if the ply count reaches 100, SF14 will always evaluate it at 0.00, so the top 4 choices will always be random.
That means you have your demented monkey at that point without needing to use 60 hours on your supercomputer to provoke minimax pathology.
I think you might find the same with all strong engines.
Edit: Just realised this is wrong. It's only the leaf evaluations that give 0.00. If it finds either a mate or move that resets the ply count the evaluation will be nonzero. But I think it will still produce a high blunder rate in positions where there is no quick mate.
#2098
"How much does one have to secretly pay 'peer reviewers' to get them to favorably 'peer-review' you ?"
++ That is not possible. Only the editor knows to which peers he anonymously sends the manuscript for a review, the author does not know. The reviewers only see the paper, they do not know who is the author. After the review the author still does not know who has reviewed his paper, though some reviewers leave some hidden hints in their comments.
If I were to submit a paper "On the number of sensible chess positions", then the editor presumably would send it to Labelle, Tromp, and Gourion for a review.
If I were to submit a paper "Chess is solved", then the editor presumably would send it to van den Herik, Allis, Allen, Walker, Schaeffer, and Gasser for a review.

This next one might be a better example.
Still not ideal - but maybe a lot 'closer'.
Dr. Joseph Mercola - perhaps the biggest quack in all of human history -
even he - has been supported by 'peer review' ...
http://marktaliano.net/peer-reviewed-manuscript-concludes-that-cdc-massively-inflates-covid-19-case-and-death-numbers-with-creative-statistics/
How much does one have to secretly pay 'peer reviewers' to get them to favorably 'peer-review' you ?
And as I read further about peer reviews - there's even the so-called 'anonymous' 'peer reviews'.
I mean... there are other metrics involved.
In which journal was it published? How many people have cited it? How many papers has this person published? How many citations does he have?
And even with Jan Hendrik Schön, science is self correcting. That's the whole point. If you want blind trust in authority visit a local church.

#2086
"As the error rate is what low?"
++ At 1 s/move: 88.2% draw = 11.8% error / game = 1 error / 679 positions
At 1 min/move: 97.7% draw = 2.3% error / game = 1 error / 3478 positions
Extrapolating: at 60 h/move: 1 error / 10^5 positions
What's the argument for why error rate is a linear function of time (from 1s/move all the way to 60h/move no less)?
Intuitively, thinking time has diminishing returns. The deeper the search, the more opportunity for error, particularly when the software is not tuned or tested for months-long games.
Also, this could only be a minimum or relative error rate correct? Without knowing one player's error rate it should be impossible to estimate the other's based purely on draw rate.
#2106
"why error rate is a linear function of time"
++ I did not assume a linear dependence.
On the contrary I assumed logarithmic dependence:
at 1 s / move: 1 error / 679 positions
at 1 min / move: 1 error / 3478 positions
at 1 h / move: 1 error / 3478 * 3478 / 679 positions = 1 error / 17815 positions
at 60 h / move: 1 error / 17815 * 17815 / 3479 positions = 1 error / 91251 positions
"this could only be a minimum error rate "
++ As the error rate is that low, the occurence of two or more errors can be neglected.
P(2 errors) = P(2 errors|1 error)*P(1error) ~= P(1 error)^2 << P(1 error)
"this could only be a relative error rate"
++ By the generally accepted hypothesis that chess is a draw each decisive game must contain at least 1 absolute error: a move that turns a drawn position into a lost one
#2104
Yes, that is correct.
Every editor of a mathematics journal receives some proofs of the Riemann Hypothesis.
Most get stopped by the editor himself or by the peer reviewers.
Some made it to publication, but then some reader pointed out a flaw and the paper had to be officially retracted, which reflects badly on the author, on the reviewers, and on the editor.
The famous paper on cold nuclear fusion was also published and had to be retracted.
On the other hand the PhD dissertation of Einstein was originally rejected.
Einstein published his most famous paper on relativity in the relatively (no pun) obscure journal Annalen der Physik to bypass incompetent reviewers and editors of the bigger journals.

#2106
"why error rate is a linear function of time"
++ I did not assume a linear dependence.
On the contrary I assumed logarithmic dependence:
at 1 s/move: 1 error/679 positions
at 1 min/move: 1 error / 3478 positions
at 1 h/move: 1 error / 3478 * 3478 / 679 positions = 1 error / 17815 positions
at 60 h/move: 1 error / 17815 * 17815 / 3479 positions = 1 error / 91251 positions
"this could only be a minimum error rate "
++ As the error rate is that low, the occurence of two or more errors can be neglected.
P(2 errors) = P(2 errors|1 error)*P(1error) ~= P(1 error)^2 << P(1 error)
"this could only be a relative error rate"
++ By the generally accepted hypothesis that chess is a draw each decisive game must contain at least 1 absolute error: a move that turns a drawn position into a lost one
Eh, I don't know why I said that. Yeah, the rule you used is when the input is x60 the output is x5... that's obviously not linear... but I still don't know the logic for why that works other than you had 2 data points and are just playing with ratios.
As for draws, oh I see, you're saying draws that happen after zero errors are much more likely than draws that happen after 2, so we can just ignore games with multiple errors.
Eh... isn't this making a lot of assumptions? For example let's say the error rate is not in the single digits for either engine, and the winner is routinely committing multiple fewer errors. Why is this scenario unlikely? Current SF is something like 300 points stronger (link below) than the one that played AZ (SF8 played AZ). Is it really sensible that the error rate was so low 5 years ago when today it doesn't require 100s of games to determine which engine is best for e.g. CCRL 40/40?
https://www.chessprogramming.org/images/0/04/SfElo.png
https://ccrl.chessdom.com/ccrl/4040/