Can Engines properly assess openings?

Sort:
Arisktotle

The question on whether or not chess engines can assess openings is equivalent to the question on whether or not chess engines can solve chess alltogether! After all, the assessment of openings depends on the assessments of the pursuing middle games and they in turn on the assessment of endgames. It is unthinkable for an engine to pick the perfect opening when it can't play the perfect endgame.

Engines haven't solved chess yet, or they would be only producing wins or only draws amongst themselves as is inevitable in a deterministic zero sum game. Therefore, we can safely assume, they don't play perfect openings either - unless they do so by chance awaiting confirmation in decades to come.

tmkroll
drmrboss wrote:
tmkroll wrote:

Drmrboss, we were using the analysis on this site and in people's version of Stockfish on their home computers and the let's check analysis on the chessbass cloud which has come from a lot of people running engines on different machines with different depths. The phrase "to a ridiculous search depth" or some such other came up regarding some of that analysis. I might be able to find one of the threads again. It was in the Nxf7 variation where Black needs to counter-intuitively move the already developed Knight backward to block a check instead of developing another piece with tempo. I never saw the engine suggest that move unless you suggested it first and then it gets in the hash table or something and the next time you run the search it looks at that move. When you do make it look at that move it pretty quickly finds the draw. That was where people were talking about very high search depths all coming up 0. No one said anything about nodes. The member who told us that 0-0-0 was actually better said the line was known as an "engine trap." IMO, there was no point regardless, as I said, though because I don't think any engine will take on f7 with the Knight in the first place if you don't make it start in that position.

What do you mean?

Nxf7 was not suggested but still draw .Do you mean other moves that SF suggest leads to worse than Nxf7? Show me the FEN or position let me analyse with my 4 cores i5 cpu for 3 mins with  6 men TB access. ( approx 1 billion position ).

I wont be surprised if SF missed 1% of opening ( I will forward that 1% for stockfish development fishcooking discussion)

Are you saying other moves that SF suggest leads to a loss or  worse position?

 

I meant the entire discussion took place after Nxf7 (which does draw but everyone knows is not good.) The OP was claiming White was winning because SF said so. I showed that Black, in fact, could sac a piece a force a draw by perpetual check. SF agrees with that, but only if you show it one of the moves leading up to the draw. For some reason that thread seems to have been deleted (I'm not sure the OP still has an account,) but the same line came up in another thread and someone else came along there and suggested a different move later on in that line for Black which leads to an advantage. So yeah, the moves SF suggest do lead to a loss or worse or position in that line. (But you have to plan the inferior Nxf7 to get into that line and I don't think SF is going to suggest it, so it's not the best example.) There are still positions like this, material imbalances in crazy, romantic openings that engines tend to mis-evaluate until they calculate past the important horizon. I will see if I can find that thread again but it seems you already have endgames here to test. Not what you were asking about, but another place where we know engines can be weak.

 

Numquam
tmkroll schreef:

There's a line in the Traxler where a lot of people on this forum could play better than Stockfish. We were debating it here a few years back. Stockfish says White is winning until it sees black has a draw by repetition, then its evaluation goes to 0. Stockfish will take that draw but people who read that forum would castle Queenside as black. Eventually Stockfish sees black is better but it takes it a very long time. There's a line in the KG that at least five or six years back Fritz was similar, idk about now. Of course engines will never play into either of these lines if you don't make it do it because their opening books have been programmed by human players who have studied and know they are bad.

You mean this:

 

tmkroll

Ok, I found the line. It's 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nf6 4. Ng6 Bc5 5. Nxf7 Bxf2 6. Kf1 Qe7 7. Nxh8 d5 8. exd5 Nd4 9. c3 Bg4 10. Qa4+ Nd7(this is the move that IME it still seems it's difficult for engines to find) 11. Kxf2 Qh4+ and after something like Kf1 Black can force a draw by perpetual but actually 0-0-0 looks like it could be winning. (and you were actually the thread where I found this line mentioned but you didn't say anything more after it was brought up, perhaps you unfollowed and didn't see it there.) Over the course of this line SF said White was winning for quite a few moves, then its evaluation moved to 0, then finally over to a considerable advantage for the second player, so it seems to be a position where an engine can misevaluate the opening. Now again I don't if you turn off the opening book will SF actually choose to play Nxf7, somehow I doubt it will get into the fried liver at all, but at least in the past, if you make it play from that position (which should probably be a draw) SF has suggested White play into this bad, possibly losing line, instead of drawing another way. You can check it on your strong machine and see if this still holds up. 

tmkroll

Lol, Numqualm, yes, that is what I meant. Though we ended up wasting work both writing it out at the same time, thank you.

Numquam
tmkroll schreef:

Lol, Numqualm, yes, that is what I meant. Though we ended up wasting work both writing it out at the same time, thank you.

No problem, figuring out what you meant is like a puzzle. The computer can find Nd7 instantly btw, but it plays c3 at low depth. It'll figure out that c3 is bad if you let it run long enough.

tmkroll

Well thanks again, Numquam. It was some years ago when I was last in the discussion about that line and those things were not the case. Nd7 was a hard move to find, and the engine though c3 was good before it thought it was even and then bad. Back then our computers were slower, or perhaps someone will say we were using the engines wrong (although I think some of the people there did, in fact, know how to use chess engines). The other line I was thinking is in the Salvio Gambit of the King's Gambit where Black plays Nc6 and lets White take the Rook in the corner. If you're curious I'm sure I could relate that one back from memory but my guess is SF can properly asses that one now too, and it's the same thing. SF is not going to play the Salvio Gambit to begin with. It's just the question was if engines can properly assess openings. IME they have not always been able to do that... and perhaps all my examples will be dated, but that doesn't mean some of the things we're talking about now won't be improved upon in a few more years. As long as there is room for that to continue to happen it seems to suggest the answer to the original question is going to have to be no.

m_connors

It's my understanding engines are pre-programmed with all types of openings. Just go to Learn - Openings right here on chess.com and many opening lines are shown and evaluated by popularity and winning percentages. No opening is "perfect" and at some point players are going to deviate to play their own game. So, within reason, all modern engines should be able to assess any opening. You just need to find the ones you're most comfortable with and can play well.

HolyCrusader5
drmrboss wrote:
HolyCrusader5 wrote:

Engines may play openings better than humans but they are unable to determine whether an opening is sound.

Why not?? You can see evaluation values before and after the moves. If evaluations get worse, it is unsound, getting better means sound.

Or else, do you want an engine to explain the moves verbally? In fact it is possible, if chess is a big business that everyone is interested, programmers can do verbal output / explaination of engine moves.

 

They have their own evaluation that guide them to decide the better moves and they always choose the better move after the allocated search time.

I believe the reason why is because Chess engines love certain advantages like space which leads to them hating certain openings such as the Alekhine, Pirc, and the Kings Indian

HolyCrusader5

I think dmrboss believes that engines are superior to humans in openings because they are smarter. But hundreds of theoreticians using an engine analysis to assist them can play an opening better. Chess Engines like Leela can contribute to theory a lot, but they are not the basis of all theory. Engines that believe hypermodern openings are unsound believe that entirely because they like a space advantage. A "perfect engine" could possibly determine the soundness of openings with humans fact-checking. We should keep a close eye on AlphaZero in the case of which it evaluates openings. I doubt that this means that openings AlphaZero prefers to avoid are unsound.

drmrboss
HolyCrusader5 wrote:

I think dmrboss believes that engines are superior to humans in openings because they are smarter. But hundreds of theoreticians using an engine analysis to assist them can play an opening better. Chess Engines like Leela can contribute to theory a lot, but they are not the basis of all theory. Engines that believe hypermodern openings are unsound believe that entirely because they like a space advantage. A "perfect engine" could possibly determine the soundness of openings with humans fact-checking. We should keep a close eye on AlphaZero in the case of which it evaluates openings. I doubt that this means that openings AlphaZero prefers to avoid are unsound.

Yes, in general, there is no way to compete someone who can search positions a million times faster. 

You can still cherry pick engines mistakes in opening but engines are way better than human.   There is also stockfish evaluation table, why stockfish choose those particular moves etc (if someone is really interesed in more details).

https://hxim.github.io/Stockfish-Evaluation-Guide/

HolyCrusader5

Engines can play openings accurately. However, humans should not say that the Alekhine or Pirc is unsound because an engine says so because that would be inaccurate. Overall, they are great at opening theory.

nescitus

As an author of a chess engine (called Rodent, slightly above 3000 Elo on a CCRL scale) I must tell you that evaluations of strong engines are not objective - because they are not meant to.  The goal of returning objective evaluation is different from the goal of returning evaluation that improves engine's play. One simple example: Stockfish has ridiculously high bonus for threatening to check enemy king. Please open Stockfish in a console mode (ie. open the engine exe file directly, so that a black window appears). In that window, please type the following line:

position startpos moves d2d4 g8f6 c2c4 g7g6 b1c3 d7d5 c4d5 f6d5 e2e4 d5c3 b2c3 c7c5

then press enter, then type "eval". With Stockfish 10 You will get score of over 1.5 pawn for white, meaning that Grunfeld defence is close to busted, which it obviously isn't. What the hell is going on?

I'll tell you. Stockfish gives insane bonus for the possibility of giving two checks to black king (Bb5 and Qa4). Why does it need such an absurd bonus? The answer is in the search. If Stockfish searches, say, 30 plies, then the chance of such eval being backpropagated right to the root of the search and returned as final positional evaluation are abysmally small, something like 1 to 100.000.000. What if they do get backpropagated to the root? It means that the engine cannot avoid these threats, or has to make big concessions in order to defuse them. Had these threats been avoidable (and in the Grunfeld line they will disappear once black castles short), search would avoid them and you would not see this high value as a final score. Final score returned by modern engines is a result of a struggle between two opposing forces: evaluation function trying to generate absurdly high score, and search trying to return the lowest common denominator, score that both black and white claim to be the best. Evaluation creates high values, search gravitates towards low scores, acceptable for both sides, unless these high scores are really unavoidable.

To summarize: engine evaluations aren't objective. They are numbers created by a function whose aim is to guide the search, not to inform user about true merits of the position. What's more, Stockfist with an evaluation that tries to be objective would be weaker by 200 to 400 Elo points.

Numquam
nescitus schreef:

As an author of a chess engine (called Rodent, slightly above 3000 Elo on a CCRL scale) I must tell you that evaluations of strong engines are not objective - because they are not meant to.  The goal of returning objective evaluation is different from the goal of returning evaluation that improves engine's play. One simple example: Stockfish has ridiculously high bonus for threatening to check enemy king. Please open Stockfish in a console mode (ie. open the engine exe file directly, so that a black window appears). In that window, please type the following line:

position startpos moves d2d4 g8f6 c2c4 g7g6 b1c3 d7d5 c4d5 f6d5 e2e4 d5c3 b2c3 c7c5

then press enter, then type "eval". With Stockfish 10 You will get score of over 1.5 pawn for white, meaning that Grunfeld defence is close to busted, which it obviously isn't. What the hell is going on?

I'll tell you. Stockfish gives insane bonus for the possibility of giving two checks to black king (Bb5 and Qa4). Why does it need such an absurd bonus? The answer is in the search. If Stockfish searches, say, 30 plies, then the chance of such eval being backpropagated right to the root of the search and returned as final positional evaluation are abysmally small, something like 1 to 100.000.000. What if they do get backpropagated to the root? It means that the engine cannot avoid these threats, or has to make big concessions in order to defuse them. Had these threats been avoidable (and in the Grunfeld line they will disappear once black castles short), search would avoid them and you would not see this high value as a final score. Final score returned by modern engines is a result of a struggle between two opposing forces: evaluation function trying to generate absurdly high score, and search trying to return the lowest common denominator, score that both black and white claim to be the best. Evaluation creates high values, search gravitates towards low scores, acceptable for both sides, unless these high scores are really unavoidable.

To summarize: engine evaluations aren't objective. They are numbers created by a function whose aim is to guide the search, not to inform user about true merits of the position. What's more, Stockfist with an evaluation that tries to be objective would be weaker by 200 to 400 Elo points.

The engine compensates poor evaluation with the ability to calculate deep. It seems you are suggestion to let the engine evaluate a position without calculation. That is simply stupid. There is no point in doing what you suggest, because engines aren't designed to be used that way. I believe it uses the evaluation function at the end of  line and to find new good moves while constructing a line, not at the starting position. You are even giving an explanation which supports what I am saying. The evaluation function is designed to be used at the end of a line.

dpnorman
nescitus wrote:

As an author of a chess engine (called Rodent, slightly above 3000 Elo on a CCRL scale) I must tell you that evaluations of strong engines are not objective - because they are not meant to.  The goal of returning objective evaluation is different from the goal of returning evaluation that improves engine's play. One simple example: Stockfish has ridiculously high bonus for threatening to check enemy king. Please open Stockfish in a console mode (ie. open the engine exe file directly, so that a black window appears). In that window, please type the following line:

position startpos moves d2d4 g8f6 c2c4 g7g6 b1c3 d7d5 c4d5 f6d5 e2e4 d5c3 b2c3 c7c5

then press enter, then type "eval". With Stockfish 10 You will get score of over 1.5 pawn for white, meaning that Grunfeld defence is close to busted, which it obviously isn't. What the hell is going on?

I'll tell you. Stockfish gives insane bonus for the possibility of giving two checks to black king (Bb5 and Qa4). Why does it need such an absurd bonus? The answer is in the search. If Stockfish searches, say, 30 plies, then the chance of such eval being backpropagated right to the root of the search and returned as final positional evaluation are abysmally small, something like 1 to 100.000.000. What if they do get backpropagated to the root? It means that the engine cannot avoid these threats, or has to make big concessions in order to defuse them. Had these threats been avoidable (and in the Grunfeld line they will disappear once black castles short), search would avoid them and you would not see this high value as a final score. Final score returned by modern engines is a result of a struggle between two opposing forces: evaluation function trying to generate absurdly high score, and search trying to return the lowest common denominator, score that both black and white claim to be the best. Evaluation creates high values, search gravitates towards low scores, acceptable for both sides, unless these high scores are really unavoidable.

To summarize: engine evaluations aren't objective. They are numbers created by a function whose aim is to guide the search, not to inform user about true merits of the position. What's more, Stockfist with an evaluation that tries to be objective would be weaker by 200 to 400 Elo points.

My S10 gives 0.3 in that position, which is a very reasonable evaluation, and it doesn't think Bb5+ is better than Bc4.

nighteyes1234
dpnorman wrote:
 

My S10 gives 0.3 in that position, which is a very reasonable evaluation, and it doesn't think Bb5+ is better than Bc4.

 

His example is the search at depth 1....anyways the point is any engine is biased. Yes, that includes Leela.

That wont stop the many from saying over and over again the best first 'independent' moves are d4,c4,Nf3,and g3....and have no clue that its a transposition to the same position that has some small bonuses.   

This has also been noted in the context of the role of technology in lives, from a generational standpoint.

nescitus
The engine compensates poor evaluation with the ability to calculate deep.

No, this is the other way round. The engine positively requires skewed, far from objective evaluation to guide its search which, because of the way minimax works, tends to drive score close to zero.

dpnorman
nighteyes1234 wrote:
dpnorman wrote:
 

My S10 gives 0.3 in that position, which is a very reasonable evaluation, and it doesn't think Bb5+ is better than Bc4.

 

His example is the search at depth 1....anyways the point is any engine is biased. Yes, that includes Leela.

That wont stop the many from saying over and over again the best first 'independent' moves are d4,c4,Nf3,and g3....and have no clue that its a transposition to the same position that has some small bonuses.   

This has also been noted in the context of the role of technology in lives, from a generational standpoint.

Engines and people alike have biases in chess evaluation. Only the Flying Spaghetti Monster can tell us the answer

Prometheus_Fuschs
dpnorman escribió:
nighteyes1234 wrote:
dpnorman wrote:
 

My S10 gives 0.3 in that position, which is a very reasonable evaluation, and it doesn't think Bb5+ is better than Bc4.

 

His example is the search at depth 1....anyways the point is any engine is biased. Yes, that includes Leela.

That wont stop the many from saying over and over again the best first 'independent' moves are d4,c4,Nf3,and g3....and have no clue that its a transposition to the same position that has some small bonuses.   

This has also been noted in the context of the role of technology in lives, from a generational standpoint.

Engines and people alike have biases in chess evaluation. Only the Flying Spaghetti Monster can tell us the answer

Tablebases are unbiased, other than those we can only evaluate objectively at positions with insufficient material to win and mate in X positions*.

 

*That is if you can actually prove it is mate in X, engines are far better at this though they might still fail.

drmrboss
nescitus wrote:

As an author of a chess engine (called Rodent, slightly above 3000 Elo on a CCRL scale) I must tell you that evaluations of strong engines are not objective - because they are not meant to.  The goal of returning objective evaluation is different from the goal of returning evaluation that improves engine's play. One simple example: Stockfish has ridiculously high bonus for threatening to check enemy king. Please open Stockfish in a console mode (ie. open the engine exe file directly, so that a black window appears). In that window, please type the following line:

position startpos moves d2d4 g8f6 c2c4 g7g6 b1c3 d7d5 c4d5 f6d5 e2e4 d5c3 b2c3 c7c5

then press enter, then type "eval". With Stockfish 10 You will get score of over 1.5 pawn for white, meaning that Grunfeld defence is close to busted, which it obviously isn't. What the hell is going on?

I'll tell you. Stockfish gives insane bonus for the possibility of giving two checks to black king (Bb5 and Qa4). Why does it need such an absurd bonus? The answer is in the search. If Stockfish searches, say, 30 plies, then the chance of such eval being backpropagated right to the root of the search and returned as final positional evaluation are abysmally small, something like 1 to 100.000.000. What if they do get backpropagated to the root? It means that the engine cannot avoid these threats, or has to make big concessions in order to defuse them. Had these threats been avoidable (and in the Grunfeld line they will disappear once black castles short), search would avoid them and you would not see this high value as a final score. Final score returned by modern engines is a result of a struggle between two opposing forces: evaluation function trying to generate absurdly high score, and search trying to return the lowest common denominator, score that both black and white claim to be the best. Evaluation creates high values, search gravitates towards low scores, acceptable for both sides, unless these high scores are really unavoidable.

To summarize: engine evaluations aren't objective. They are numbers created by a function whose aim is to guide the search, not to inform user about true merits of the position. What's more, Stockfist with an evaluation that tries to be objective would be weaker by 200 to 400 Elo points.

As a member of talkchess for more than 5 years, I know most of the reasons where those evaluations come from. I dont get +1.5, instead +0.36. in grundfield. Here is analysis on my phone after 200 million nodes analysis( contempt 0, default is 24). with 5 men syzgy tablebase access.