Chess will never be solved, here's why

Sort:
Avatar of tygxc

#2032
That is not lazy: 2034 posts...
The essence is this quote by the late GM Sveshnikov:
"Give me five years, good assistants and modern computers, and I will trace all variations from the opening towards tablebases and 'close' chess."
give me = somebody has to pay for it
five years = 5 years
good assistants = to prepare starting positions to launch the search
modern computers = cloud engines of 10^9 nodes / second
all variations = the relevant ECO codes
the opening = the starting positions prepared by the good assistants
towards = starting from the opening and ending at the table base
tablebases = 7-men endgame table base
'close" chess = weakly solve chess, as already done for Checkers (8 * 8 draughts) and Losing Chess

Avatar of Elroch
tygxc wrote:

++ I determine the 4 top candidates with the Stockfish evaluation function or a simplified version of it. I reckon the optimal move is among them by extrapolation: 1 error in 10^5 positions for the top 1 move [snip]

Here is one of the more blatantly obvious of your errors. 

You think that Stockfish makes no more than one losing error in every 100,000 moves (see quote).  AlphaZero beat (an earlier version of) Stockfish 155 times in 1000 games.  According to you (who would surely have said the same thing about that version of Stockfish, which was not much more than 100 Elo points weaker), those games had an average of at least 155 * 100000 / 1000 moves = 15,500 moves.

My recollection is that this is not so, by a factor of more than 100.

You would be foolish to think the incremental improvements in Stockfish have provided that factor of more than 100 (in fact it would have to be a great deal more in practice).

Avatar of haiaku
tygxc wrote:

I am not elusive. I guess I do not understand your question. Do you want to know the time to find 4 candidate moves in 1 node of the solution tree? What do you mean by the phrase 'at any increment and depth'? What precisely do you ask and even for the 3rd time? I am willing to think about it and come up with an answer.

"At any increment of depth", not "and depth". To answer your question, I simply follow your reasoning:

tygxc wrote:

"So, you check only one candidate move for Black... and that should be enough to find her optimal strategy?"
++ Yes. Black tries to draw, white tries to win.
If a game ends in a draw, then black has succeeded and none of black's moves need to be changed, but on the contrary white has failed, so improvements for white must be investigated. If all (reasonable) alternatives for white also lead to draws, then it is proven that the game is a weakly solved to a draw and that all black moves were in retrospect optimal: fit to reach the draw.
If on the contrary the game would end in a white win, then none of the white moves need changeing and it would be necessary to look at all reasonable alternatives for black and if all those also lead to a white win, then chess would be solved to be a win for white and that in retrospect all white moves were optimal: fit to win.

Let's say that after the last Black's move the endgame is reached and it's a draw. The other 3 candidates for White (and the Black's reply for each of them) must be checked. Let's suppose that all of them lead to a draw. So the parent node for those 4 candidates (which is a Black move) at depth -2 (in plies) from the endgame leads to a draw. The engine should check now the other options for White at depth -3 down to the endgame; if Black draws in all these lines, the engine should check the candidates at depth -5, and so on.
If White wins in any of these lines, it would require no less time, because the engine should check White's strategy against all the reasonable alternatives for Black (you say that), not just one. So, in the hypothesis that the game ends in a draw, how many hours would occur (at 60 hours per position) to generate all these candidates between the starting point (which I understood is not the initial position) and the endgame? Is it clearer now?

tygxc wrote:

Please bear in mind that I have not solved chess, so I cannot tell you how I have done it. I have to think about how it can be done. I even refine my own thinking.

Well, at least we both think you have to happy.png.

Avatar of MARattigan
AmAwminem wrote:

I'm too lazy to read this

But apparently not to write it.

Avatar of Elroch

So I was correct in my interpretation of @tygxc's claim? It is reasonable to assume a misinterpretation would have received a response.

Avatar of MARattigan
Elroch wrote:

So I was correct in my interpretation of @tygxc's claim? It is reasonable to assume a misinterpretation would have received a response.

Correct, I think, only because you said, "at least 155 * 100000 / 1000 moves = 15,500 moves".

You have to remember AZ was probably making compensatory blunders.

When I tried it out in a situation where I could check the blunders against perfect play via the tablebases, here, SF14 gave me 30 blunders in 497 moves in the basic rules game (which would be the game most akin to what @tygxc now proposes).

That's a little more than 1 blunder in 17 moves. Since there were only five men on the board in a well analysed endgame where SF14's evaluation function should be more accurate than average,  I think it's fair to say that blunder rate is also an underestimate. 

But I think you're naive to expect any response from @tygxc other than to restate his original assertion. 

Avatar of haiaku
Elroch wrote:

So I was correct in my interpretation of @tygxc's claim? It is reasonable to assume a misinterpretation would have received a response.

He could say, totally neglecting pathology in game trees, that those errors occurred because SF didn't spend 60 hours every move to determine those 4 top candidates in the first place.

Avatar of MARattigan

@tygxc

#2022

"The Troitzky line concept is meaningless with the 50 move rule in effect."
++ The Troitzky line is very useful even in practice: when the blocked pawn is too far, there is no win.

There are wins with the pawn anywhere on its 7th. rank. How far does it have to go?

What I meant was you can't rely on blocking the pawn behind the Troitzky line winning in the competition rules game.

Avatar of Elroch

I thought of that objection, but it fails because merely spending 1000 times more time would be unlikely to improve results by more than 200 points. To see this is generous observe that with AlphaZero a bit more than 100 points stronger than Stockfish 8, it was necessary to give a 30:1 time handicap to make Stockfish the favourite. 30:1 to cover 100 points would be 900:1 for 200 points (but the empirical data also suggests the returns get steadily lower as you provide more time).  In truth, @tygxc is assuming incorrectly that he knows how much time Stockfish needs to achieve perfection. Far more likely is that this is the amount of time needed to blunder P% of the time, where P is small but non-zero.

Avatar of haiaku

I think I can foresee a counter-objection to that, in @tygxc style, but I will leave it up to him happy.png.

Avatar of MARattigan

@Elroch

Why do you say the blunder rate will become small as the think time increases? 

SF is not thinking. It's following an algorithm with an imperfect evaluation function. How do you prove its blunder rate will uniformly decrease with increased depth if the depth doesn't reach the minimum forced mate depth for some candidate moves?

When I tried it out it had nearly 16 times the blunder rate at four and a half minutes think time than it did at 1 second's think time from the same position under basic rules (though only 1.75 times the rate in the competition rules game for which the evaluation function is designed).

I used to have a version of Rybka running on office surplus kit around 2005. It had an 'e' on the end of its version number meaning they'd gone to special lengths to strengthen its play in various endgames without using a tablebase. It could do KNNKP mates in 50. The stockfishes crap out at a maximum of between 30 and 35 moves depending on the pawn position and SF version.

The evaluation function is, I believe, far more important than the machine speed.  You can't make a silk purse out of a sow's ear, as my father used to say.

Avatar of Elroch

I don't understand that empirical claim if it is assumed to be general. More computing time should typically be better if used sensibly. The reason is simply that you replace the bare evaluation function by an evaluation based on some exploration of moves that are likely to be good. Generally this increases the accuracy of the evaluation.

(A minor point is that traditional chess engines use a silly scale for evaluation, based on pseudomaterial. Obviously better is to use outcome probabilities. In this case the future exploration should reduce the cross entropy loss in general, according to all common sense.

Note that the fact that using the bare evaluation function to pick moves is inferior to using a little time to explore future possibilities, and that that the improvement should be monotone in the amount of time for a small amount of time seems to directly imply the result that more time is always better. My loose reasoning is that what adding more time does is add more exploration to little explored nodes (especially leaf nodes).  Of course it also includes exploring low ranking candidates for nodes nearer to the root, but again it is difficult to see how this could ever not be an improvement.

If your claim was true in general (rather then, as I believe, in extremely long mating endgames) when Stockfish played with 4 minutes a move, it would do better to use 1 second.

But the truth is that giving Stockfish more time helped it against AlphaZero (not perfect, but a pretty good detector of blunders).

[Note that it could be that having a very long mate might be associated with some odd characteristics of the optimal line which made it very untypical in the view of Stockfish. I am not sure exactly how, but some way that makes more time detrimental!]

Avatar of dogs4evr

GUYS JUST STOP

Avatar of MARattigan
Elroch wrote:

I don't understand that empirical claim if it is assumed to be general. More computing time should typically be better if used sensibly. The reason is simply that you replace the bare evaluation function by an evaluation based on some exploration of moves that are likely to be good. Generally this increases the accuracy of the evaluation.

This was the paper that originally questioned that assumption, since when there have been rather a lot more.

My empirical claim is not so much a claim of fact, rather a claim of possibility. I tried out only one position in an endgame where the "real" evaluations of sibling nodes in terms of WDL are not very uniform. (Some authors have suggested that such uniformity minimises the pathology.) 

The process was a bit of a pain, so I didn't pursue it further but the result appeared to be in line with the effects of minimax pathology.

(A minor point is that traditional chess engines use a silly scale for evaluation, based on pseudomaterial. Obviously better is to use outcome probabilities. In this case the future exploration should reduce the cross entropy loss in general, according to all common sense.

The scale is unimportant, any order isomorphism would give identical results. But the order of specific evaluations is very important, if SF's evaluations were reversed in sign for example it would become an excellent helpmate partner. 

Imperfect evaluations can, I think, cause winning lines to be pruned or vacillation that could lead to a draw under the 50 move or triple repetition rule. (The latter is overridden when SF thinks it's necessary - see e.g. the first few moves of the 8 sec and 256 sec games here.)

Certainly, what I said about Rybka is true and it must have been searching to a lower depth.

But the truth is that giving Stockfish more time helped it against AlphaZero (not perfect, but a pretty good detector of blunders).

There you have to be a bit careful. SF is obviously superb against players with limited look ahead / understanding (there seems to be a trade off between the two). It's evaluations are designed for that. Blunders at one level of play may not necessarily be blunders at a different level (and obviously not in basic rules and competition rules games).

Perfect play may be very different from practical play.

[Note that it could be that having a very long mate might be associated with some odd characteristics of the optimal line which made it very untypical in the view of Stockfish. I am not sure exactly how, but some way that makes more time detrimental!]

 

Avatar of MARattigan
Optimissed wrote:
Elroch wrote:

 More computing time should typically be better if used sensibly."

He's saying that SF isn't programmed well enough to benefit from more time. The reason would be that the accuracy/depth profile being used by SF is insufficiently good to benefit. ...

I think it's supposed to be more than that. 

The alpha beta pruning process could amplify the imperfections in the evaluation function as the result bubbles back to the start, the deeper, the worse.

 

Avatar of Elroch

What is at least intuitively feasible is that this is a sort of bias-variance trade-off. A bare evaluation is wrong (it misses stuff that happens later), but it has low variance - there is no error due to selecting future analysis lines. Analysis of future lines provides you with a sample from which you might, for example, extract the critical line, a min-max and evaluate its terminus.

This would increase the variance of your estimate of value because of the incompleteness of the sample from future lines which will presumably have terminal values which have scatter (and a mean which on average would be similar to the bare evaluation).   If the analysis was thorough (think tablebase), it would always be an improved evaluation, but when it is a sample, it has randomness in the value provided. We have seen this in the curiosity of Stockfish believing the evaluations of promotion to rook or queen are different, despite also believing the best response is to capture the promoted piece.

IMHO, this problem can only occur as a result of the suboptimality of the algorithms used by Stockfish.  When it would go wrong is when the analysis of a node is inadequate, and the random selection of future lines happens to have a misleading fluke.  The way this could be corrected is that the algorithm for evaluation only gives slight credit to such an analysis. Instead of using the min-max value (which would later turn out to be based on believing the wrong move is the best one), the algorithm would use some weighted combination of the bare evaluation and the evaluation of the deeper lines. With sufficient weight to the bare evaluation, the variance would be reduced because halving the weight for the deeper evaluation reduces the variance due to this by a factor of 4. Some weight would still achieve a better evaluation than either the bare one or the min-max one.

It is possible that Stockfish already does something like this - I really don't know. (Another possibility is that the evaluations of all nodes are incorporated in a way related to how likely they are to be critical).

If so, it could simply be that it was doing it in a way which is on average good but deals badly with the sorts of position you analysed (one of which was a very deep tablebase mate, I believe?)

Avatar of MARattigan

@Elroch

I chose the endgame because of the number of zugzwangs. I didn't think I'd see any effect on my desktop otherwise. So, yes, it was chosen to provoke SF into dealing badly with it.

On the other hand I think the majority of @tygxc's processing time would eventually be spent in such positions. The mate is not very long even for KNNKP in the basic rules game at 85 moves, but I thought with @tygxc's Schrödinger's 50 move rule, I should keep it possible in the competion rules game. Three trillion moves - now that would be long.

By the way, the minimax pathology is not specific to chess. Nau demonstrated it in some specially constructed games where it was possible to prove the effect. It's apparently a potential problem with the algorithm in general as far as I know. But I'm no expert - I've only just started looking at this stuff.

Avatar of Elroch

Are you aware of the algorithms used by Stockfish? I would find it much easier to develop an improvement to min-max with an evaluation which is probability based (essentially with the aim of finding the best compromise between bias and variance).

For anyone who has no idea what all this "bias-variance" stuff is, it comes from machine learning algorithms, but I believe I am correct in saying that the concept applies to the ad hoc evaluation algorithms used to generate a tree of variations and then derive an evaluation from that tree. Such an algorithm can be viewed as a model which tries to estimate the true value with optimal play and which uses the empirical data which comprises (some subset of) the legal continuations (plus another ad hoc model which provides bare evaluations of positions).

If you look at an article on the bias-variance trade-off, it's worth noting that chess has no irreducible error because it is a game of perfect information with a definite result.

Avatar of playerafar


from @Elroch
 "In truth, @tygxc is assuming incorrectly that he knows how much time Stockfish needs to achieve perfection. Far more likely is that this is the amount of time needed to blunder P% of the time, where P is small but non-zero."

That looks well said.  Key.

We also got this - from @MARattigan
"SF is not thinking. It's following an algorithm with an imperfect evaluation function."
Also key.  With the central word there being 'imperfect'.

If one compares the two assertions -
they appear to be in agreement with each other. 
'blunder' and 'imperfect'.

I imagine many of the supercomputer or super-engine 'blunders and imperfections' are actually inferior positional moves - rather than tactical blunders.

Avatar of ifemo

60 hours or 72 hours of chess @ifem`s is right