Stockfish Resigns In "Winning" Position!

Firebrandx

I think the problem here with the OP's post is a misunderstanding of the computer's evaluation. When I plug it into Stockfish 10 on my Ryzen 7 2700X, it doesn't have anything to do with the computer 'thinking it is winning', but rather a positional/material score count (which by the way on my Ryzen 7 2700X scores it as +1.59 on zero contempt and stays that way). In this case, it scores the space advantage to white. You can tell the computer doesn't think anything more than that, because the evaluation sticks to the same score no matter how deep you let the ply go (within human limits).

Now consider if we refuse to play a6.  At ply 40 with no contempt, SF10 evals the position at +0.36 and dropping due to black being able to open the queenside to activate his heavy pieces in order to equalize. So can you really blame the computer here? It KNOWS allowing black to open the queenside leads to an easily seen drop in eval to a draw. On the flip-side, the positional score never drops after playing a6 until such time as a 3-fold is inevitable.

Conclusion: Stockfish didn't miss anything. The game is a draw if a6 isn't played. The game is a MUCH longer draw if it is played. Therefore, it plays a6.

Phoenyx75
Firebrandx wrote:

I think the problem here with the OP's post is a misunderstanding of the computer's evaluation. When I plug it into Stockfish 10 on my Ryzen 7 2700X, it doesn't have anything to do with the computer 'thinking it is winning', but rather a positional/material score count (which by the way on my Ryzen 7 2700X scores it as +1.59 on zero contempt and stays that way). In this case, it scores the space advantage to white. You can tell the computer doesn't think anything more than that, because the evaluation sticks to the same score no matter how deep you let the ply go (within human limits).

Now consider if we refuse to play a6.  At ply 40 with no contempt, SF10 evals the position at +0.36 and dropping due to black being able to open the queenside to activate his heavy pieces in order to equalize. So can you really blame the computer here? It KNOWS allowing black to open the queenside leads to an easily seen drop in eval to a draw. On the flip-side, the positional score never drops after playing a6 until such time as a 3-fold is inevitable.

Conclusion: Stockfish didn't miss anything. The game is a draw if a6 isn't played. The game is a MUCH longer draw if it is played. Therefore, it plays a6.

 

Interesting. I agree that Stockfish should play a6 if it gives the opponent more chances to mess up, but I also think that if it knows the position is a draw with perfect play, it should show that in the evaluation- the bottom line in the game isn't to have space, it's to checkmate the king after all. 

Vicariously-I
Firebrandx wrote:

I think the problem here with the OP's post is a misunderstanding of the computer's evaluation. When I plug it into Stockfish 10 on my Ryzen 7 2700X, it doesn't have anything to do with the computer 'thinking it is winning', but rather a positional/material score count (which by the way on my Ryzen 7 2700X scores it as +1.59 on zero contempt and stays that way). In this case, it scores the space advantage to white. You can tell the computer doesn't think anything more than that, because the evaluation sticks to the same score no matter how deep you let the ply go (within human limits).

Now consider if we refuse to play a6.  At ply 40 with no contempt, SF10 evals the position at +0.36 and dropping due to black being able to open the queenside to activate his heavy pieces in order to equalize. So can you really blame the computer here? It KNOWS allowing black to open the queenside leads to an easily seen drop in eval to a draw. On the flip-side, the positional score never drops after playing a6 until such time as a 3-fold is inevitable.

Conclusion: Stockfish didn't miss anything. The game is a draw if a6 isn't played. The game is a MUCH longer draw if it is played. Therefore, it plays a6.

If the computer sees that the position will be a draw with best play then it would evaluate the position as equal. Since it evaluates the position as +1.59 (according to your engine) it must assess that the position is a win for White. I've heard masters say that even a 1.4 evaluation advantage is supposed to be winning. 

Phoenyx75
Vicariously-I wrote:
Firebrandx wrote:

I think the problem here with the OP's post is a misunderstanding of the computer's evaluation. When I plug it into Stockfish 10 on my Ryzen 7 2700X, it doesn't have anything to do with the computer 'thinking it is winning', but rather a positional/material score count (which by the way on my Ryzen 7 2700X scores it as +1.59 on zero contempt and stays that way). In this case, it scores the space advantage to white. You can tell the computer doesn't think anything more than that, because the evaluation sticks to the same score no matter how deep you let the ply go (within human limits).

Now consider if we refuse to play a6.  At ply 40 with no contempt, SF10 evals the position at +0.36 and dropping due to black being able to open the queenside to activate his heavy pieces in order to equalize. So can you really blame the computer here? It KNOWS allowing black to open the queenside leads to an easily seen drop in eval to a draw. On the flip-side, the positional score never drops after playing a6 until such time as a 3-fold is inevitable.

Conclusion: Stockfish didn't miss anything. The game is a draw if a6 isn't played. The game is a MUCH longer draw if it is played. Therefore, it plays a6.

If the computer sees that the position will be a draw with best play then it would evaluate the position as equal. Since it evaluates the position as +1.59 (according to your engine) it must assess that the position is a win for White. I've heard masters say that even a 1.4 evaluation advantage is supposed to be winning. 

 

Yeah, I agree. My guess is this is just a bug (amoung others it seems) that those behind Stockfish haven't yet fixed. 

Firebrandx
Vicariously-I wrote:

If the computer sees that the position will be a draw with best play then it would evaluate the position as equal. Since it evaluates the position as +1.59 (according to your engine) it must assess that the position is a win for White. I've heard masters say that even a 1.4 evaluation advantage is supposed to be winning. 

 

Negative. That's not how computer eval scores work, ESPECIALLY when the eval gets stuck on the same score no matter how deep the ply goes. Only novices think the eval score = winning. You have to have experience with both human chess and engine chess to know the difference in a given position. Like I said before, the computer does not 'think it is winning' unless it spies a forced mating sequence. Everything else before that has to be interpreted correctly by the human.

And before you start arguing with me, consider that experienced ICCF players and centaur chess players know exactly what I'm talking about, AND they know I'm spot-on correct.

Phoenyx75
Firebrandx wrote:
Vicariously-I wrote:

If the computer sees that the position will be a draw with best play then it would evaluate the position as equal. Since it evaluates the position as +1.59 (according to your engine) it must assess that the position is a win for White. I've heard masters say that even a 1.4 evaluation advantage is supposed to be winning. 

 

Negative. That's not how computer eval scores work, ESPECIALLY when the eval gets stuck on the same score no matter how deep the ply goes. Only novices think the eval score = winning. You have to have experience with both human chess and engine chess to know the difference in a given position. Like I said before, the computer does not 'think it is winning' unless it spies a forced mating sequence. Everything else before that has to be interpreted correctly by the human.

And before you start arguing with me, consider that experienced ICCF players and centaur chess players know exactly what I'm talking about, AND they know I'm spot-on correct.

 

Ok, I can believe that Stockfish is functioning the way it's supposed to. But between you and me, don't you think it'd make more sense if it were to give an evaluation of 0? After all, with perfect play, this is a draw. Are you saying Stockfish doesn't realize that? Or are you saying that it realizes that but still think it's best to give itself a nice big positive evaluation?

ArtNJ

An engine like stockfish, as opposed to Leela or Alpha Zero, can't give this a zero because the things it scores when evaluating a position as advantageous persist throughout its search depth.  Things like mobility, control of the center, piece activity, space and other things that *normally* provide lasting advantages (not saying all are at issue here).  You seem to be positing that because the evaluation isn't changing as Stockfish goes deeper it should recognize this is a draw.  And indeed, that is normally a pretty good marker of a drawn position.  But is it always the case that a position is drawn if Stockfish's eval doesn't change as it goes from X to Y depth?  How else would it spot a draw, absent the traditional methods like 3 fold repetition in its horizon, tablebases, etc.  Its not Leela or Alpha Zero after all.  

Vicariously-I

I admit that I don't really know how engines evaluate positions. I only know what I've heard from other people, which may not be accurate. I've never really looked into it on my own. You guys seem to know more about it than I do so I'll take your word for it. 

Phoenyx75
ArtNJ wrote:

An engine like stockfish, as opposed to Leela or Alpha Zero, can't give this a zero because the things it scores when evaluating a position as advantageous persist throughout its search depth.  Things like mobility, control of the center, piece activity, space and other things that *normally* provide lasting advantages (not saying all are at issue here).  You seem to be positing that because the evaluation isn't changing as Stockfish goes deeper it should recognize this is a draw.  And indeed, that is normally a pretty good marker of a drawn position.  But is it always the case that a position is drawn if Stockfish's eval doesn't change as it goes from X to Y depth?  How else would it spot a draw, absent the traditional methods like 3 fold repetition in its horizon, tablebases, etc.  Its not Leela or Alpha Zero after all.  

 

Alright, like Vicariously, I think I'll take your word for it as well. Initially I thought that ofcourse Stockfish should have not played a6, but after Firebrand said " Stockfish didn't miss anything. The game is a draw if a6 isn't played. The game is a MUCH longer draw if it is played. Therefore, it plays a6", that pretty much knocked that one out. So I resign this argument :-p. 

ArtNJ

I'm no expert.  Like you guys, I've noticed that an unchanging score as the engine goes deeper usually means a drawn.  I just don't know if that always means a draw.  I assume this is not actually a reliable method, or Stockfish would use that in choosing moves, and chose the also favorable move that is not a draw, or not as clearly a draw.  Someone that plays in "centaur" (machine assisted) tournaments could tell us better -- just saying if there was a reliable method for it to detect a drawn position, Stockfish would certainly use it since its in constant competition for the top spot.  And being able to detect a draw through another method would be very useful.  Although, I suppose, Stockfish could select the draw line if it the developers thought process was something like, "we have the #1 enginge, Stockfish is more likely to win if we select the top eval, even if it might be a theoretical draw."

Phoenyx75
ArtNJ wrote:

I'm no expert.  Like you guys, I've noticed that an unchanging score as the engine goes deeper usually means a drawn.  I just don't know if that always means a draw.  I assume this is not actually a reliable method, or Stockfish would use that in choosing moves, and chose the also favorable move that is not a draw, or not as clearly a draw.  Someone that plays in "centaur" (machine assisted) tournaments could tell us better -- just saying if there was a reliable method for it to detect a drawn position, Stockfish would certainly use it since its in constant competition for the top spot.  And being able to detect a draw through another method would be very useful.  Although, I suppose, Stockfish could select the draw line if it the developers thought process was something like, "we have the #1 enginge, Stockfish is more likely to win if we select the top eval, even if it might be a theoretical draw."

 

I think we clearly would want to talk to Stockfish developers on this- they would probably have the best idea as to what's going on here. I suspect that they wouldn't over rate their positions just because Stockfish wins most of the competitions it's in these days, barring the ever elusive Alpha Zero (ever elusive because can't just have it running against Stockfish when we want to- it only runs when google wants it to, and that's not often :-p).