Can Engines properly assess openings?

Sort:
Numquam
MARattigan schreef:

@Numquam

That they can be used for openings is of course true. The question is can they assess openings properly? A good move is only good until a refutation is found. SF's are clearly effective against other players. Whether they're accurate is a different question. 

It's a strange coincidence that on endgames that I know how to play well with more than 3 men Stockfish appears to be slightly crappy to crappy, whereas on endgames I don't know how to play well it appears to be sh*t hot. I suspect its still as crappy. It's just that I'm even crappier. 

I think the same is true of the human race in general for endgames with more men, including openings. It doesn't seem to be on the cards that SF would improve its accuracy in these, it's just that humans are worse.  

As for KNNKP being rare, that's a function of how many people are acquainted with it. Troitsky could find only six recorded games when he analysed the ending, but a few years ago at a USCF tournament  it was being played at two adjacent tables. Who could say what endgames would occur in perfect games?

Are the wins KNNKP so unusual, or just unusual compared with the endgames people generally feel comfortable with? I would have thought the win in the second diagram in post #82 wasn't much out of the ordinary. In any case SF already starts going wrong with KBNK (in terms of making the most accurate moves).      

Objectively a position is either a win for one side or a draw. A good move never makes the evaluation worse or it wouldn't be a good move. Stockfish very rarely makes move which makes the objective evaluation worse. If multiple moves have the same evaluation, then the best move is subjective. 

Often winning endgames are won, because one side can promote a pawn. If that is not the case, then usually there are rooks or queens on the board. In such endgames it is more obvious if one side makes progress towards a win, so engines can usually handle these endgames well. Endgames where you checkmate with bishops and knights are rare. I think stockfish can checkmate with knight and bishop fine, but in the KNNKP endgame it is much harder to see for an engine if you make progress towards mate. The creators of stockfish also don't have any reason to make it play such endgames well, because you can just use a tablebase. While you are correct that engines don't play openings perfectly, the reasoning is bad. The KNNKP endgame tells us nothing about the opening.

Whether stockfish plays crappy or not is subjective. I think it is weird to say that it plays crappy when it plays better than the best human in the world. It seems you are expecting perfect play. 32-man tablebase isn't coming any time soon. Also why would GMs use engines for opening preparation, if they play crappy?

drmrboss
HolyCrusader5 wrote:

What are you guys arguing about? I thought it was already concluded that engines cannot determine the soundness of openings because they lack objectivity like all human beings. Why are you bringing up Nalimov Tablebases?

It is your assumption, without any proof!

There is no need to proof engines superity. Engines are far superior in majority of situation since 20 years ago.

That is why everyone is scare of engine assistant in any phase of the game.

Do you know why the word " Cheat was invented"? Do you know how people cheat? 

HolyCrusader5
drmrboss wrote:
HolyCrusader5 wrote:

What are you guys arguing about? I thought it was already concluded that engines cannot determine the soundness of openings because they lack objectivity like all human beings. Why are you bringing up Nalimov Tablebases?

It is your assumption, without any proof!

There is no need to proof engines superity. Engines are far superior in majority of situation since 20 years ago.

That is why everyone is scare of engine assistant in any phase of the game.

Do you know why the word " Cheat was invented"? Do you know how people cheat? 

What is your proof?

HolyCrusader5

If an engine played itself from the starting position of the Kings Indian Defence what do you think the result would be.

Prometheus_Fuschs
drmrboss escribió:

If someone is really serious in opening preparation , there are state of art computer prepared opening books. 

1. Book by stockfish ( brainfish book)

2. Book by Leela ( Leela book).Leela is a better choice but the book size is small.

 

I would be really surprised if you can nick pick more than 1% of the flaw in those books. May be 0.1% flaw. Who knows

https://zipproth.de/Brainfish/download/

 


 

Cerebellum does rely on Stockfish but it doesn't use its evaluation blindly...

MARattigan
Numquam wrote:
MARattigan schreef:

@Numquam

That they can be used for openings is of course true. The question is can they assess openings properly? A good move is only good until a refutation is found. SF's are clearly effective against other players. Whether they're accurate is a different question. 

It's a strange coincidence that on endgames that I know how to play well with more than 3 men Stockfish appears to be slightly crappy to crappy, whereas on endgames I don't know how to play well it appears to be sh*t hot. I suspect its still as crappy. It's just that I'm even crappier. 

I think the same is true of the human race in general for endgames with more men, including openings. It doesn't seem to be on the cards that SF would improve its accuracy in these, it's just that humans are worse.  

As for KNNKP being rare, that's a function of how many people are acquainted with it. Troitsky could find only six recorded games when he analysed the ending, but a few years ago at a USCF tournament  it was being played at two adjacent tables. Who could say what endgames would occur in perfect games?

Are the wins KNNKP so unusual, or just unusual compared with the endgames people generally feel comfortable with? I would have thought the win in the second diagram in post #82 wasn't much out of the ordinary. In any case SF already starts going wrong with KBNK (in terms of making the most accurate moves).      

Objectively a position is either a win for one side or a draw. A good move never makes the evaluation worse or it wouldn't be a good move.

That assumes it is a good evaluation or am I misunderstanding something?.

Stockfish very rarely makes move which makes the objective evaluation worse.

SF will lose half and full points in endgames with more than four men and not rarely. By any objective evaluation that makes things worse.

If multiple moves have the same evaluation, then the best move is subjective. 

Often winning endgames are won, because one side can promote a pawn. If that is not the case, then usually there are rooks or queens on the board. In such endgames it is more obvious if one side makes progress towards a win, so engines can usually handle these endgames well. Endgames where you checkmate with bishops and knights are rare. I think stockfish can checkmate with knight and bishop fine, but in the KNNKP endgame it is much harder to see for an engine if you make progress towards mate.

Here is a White won position with just queen, rooks and pawn. SF blows a half point on move 3 against Lomonosov (and if White is to stand any chance of winning his first two moves were forced anyway). 

 

SF mates OK in KBNK but it's already becoming inaccurate. It routinely gives me moves whichever side it plays (but usually just one).

The creators of stockfish also don't have any reason to make it play such endgames well, because you can just use a tablebase. While you are correct that engines don't play openings perfectly, the reasoning is bad. The KNNKP endgame tells us nothing about the opening.

Forget KNNKP as above. I think it's general. My reasoning is essentially that you wouldn't expect SF to be able to run with 20+ men if it can't walk with 5 or 6. This may well be bad reasoning on the grounds you mention, and glancing at the evaluation function that @drmrboss posted it's apparent that the evaluation function changes from the opening to the endgame. Having said that, I think the endgame evaluation would usually take effect before the EGTBs come into play

Whether stockfish plays crappy or not is subjective.

It's only subjective because we don't have 32 man EGTBs. If we did and it turned out that SF regularly gave away half and full points within a couple of moves, then that would be crappy in comparison to the EGTB (crappy in reality). My guess, from it's performance with 5 or 6 men, is that would be the case (with knobs on).

I think it is weird to say that it plays crappy when it plays better than the best human in the world.

What I mean is we're probably all incredibly crappy in real terms, but SF is less crappy than practically anybody or anything else. So its evaluations are certainly useful, but probably it's opening assessments are crappy in real terms, and that might be taken to mean, in OP's terms it doesn't assess openings properly.

It seems you are expecting perfect play. 32-man tablebase isn't coming any time soon. Also why would GMs use engines for opening preparation, if they play crappy?

I'm not expecting perfect play at all. I thought that's what I said.

GMs use engines for opening preparation because GMs can play crappier than the engines they use (though less crappy than the rest of us).

 

drmrboss
Optimissed wrote:

 Engines are so bad at openings, they have to have a book to play a decent game.  There was one called Rebel which was far better and which played more like a human.

I disagreed. Please show evidence?. Chess engines in 20 years ago played so bad .e.g 1. Nf3 2. Nc3 in opening. Those days were gone. 

 

Your favourite engine rebel will be so noob against Stockfish? ( or Hiarcs, some people think so much human like). Put those engines against x60 time odd against SF and SF will still beat them. 1 hour vs 1 min game.

drmrboss
HolyCrusader5 wrote:
drmrboss wrote:
HolyCrusader5 wrote:

What are you guys arguing about? I thought it was already concluded that engines cannot determine the soundness of openings because they lack objectivity like all human beings. Why are you bringing up Nalimov Tablebases?

It is your assumption, without any proof!

There is no need to proof engines superity. Engines are far superior in majority of situation since 20 years ago.

That is why everyone is scare of engine assistant in any phase of the game.

Do you know why the word " Cheat was invented"? Do you know how people cheat? 

What is your proof?

Google yourself. Learn the strength of engine yourself.

 

" You are like asking, where is the proof that the earth is sphere. I am standing on the flat ground etc". 

HolyCrusader5

I think you have to understand that engines are not meant for openings. They are meant for middlegames and endgames. There are 2 reasons why they have opening books. 

1. To save time during the opening 

2. Make sure they play the opening correctly.

HolyCrusader5

Stating that an engine can play an opening correctly is like saying I can do math correctly when I have a calculator. 

MARattigan
HolyCrusader5 wrote:

Stating that an engine can play an opening correctly is like saying I can do math correctly when I have a calculator. 

And they don't work. If I check by hand they're invariably wrong.

MARattigan

@HolyCrusader5: "I think you have to understand that engines are not meant for openings. They are not meant for middlegames and endgames."

Couldn't have put it better myself.

(I see you changed it, but preferred it before.)

drmrboss
Manatini wrote:

DrMrBoss, you're not dumb, so you should understand being better than humans in a game is not the same as being better in analysis.

Engines have been better than humans in games since around the time Kramnik lost to Fritz, but competitions like ICCF have been around precisely because only a fool would play all engine moves an except to win a correspondence game.

If you played OTB (or ICCF) you'd learn why trusting your phone engine (or other engine) is foolish. As it is you're just an engine fan boy.

Oh well, you can tell me small circumstances where engines play inaccurately. In general those high quality ICCF games made probably 0.00001% chess  games and almost 99.999% of ICCF players heavily use engines. I never say engines 100% accurate. I know Stockfish did not play well in some pawn structure of french and catalan( but given enough time , Stockfish play well again in some of those lines ).

 

These dudes are like 1200-1500 immedite players, Stockfish analysis in 1 sec per move ( approx 3200 rating) extremely more than enough to guide in their opening.. I do use approx 30 secs  per move( approx 3500 rating) in critical points.

HolyCrusader5

I doubt anybody who cheats with engines uses it during the opening.

MARattigan
drmrboss wrote:
Manatini wrote:

DrMrBoss, you're not dumb, so you should understand being better than humans in a game is not the same as being better in analysis.

Engines have been better than humans in games since around the time Kramnik lost to Fritz, but competitions like ICCF have been around precisely because only a fool would play all engine moves an except to win a correspondence game.

If you played OTB (or ICCF) you'd learn why trusting your phone engine (or other engine) is foolish. As it is you're just an engine fan boy.

Oh well, you can tell me small circumstances where engines play inaccurately. In general those high quality ICCF games made probably 0.00001% chess  games and almost 99.999% of ICCF players heavily use engines. I never say engines 100% accurate. I know Stockfish did not play well in some pawn structure of french and catalan( but given enough time , Stockfish play well again in some of those lines ).

 

These dudes are like 1200-1500 immedite players, Stockfish analysis in 1 sec per move ( approx 3200 rating) extremely more than enough to guide in their opening.. I do use approx 30 secs  per move( approx 3500 rating) in critical points.

You don't appear to adapt to evidence or argument. You keep saying saying SF will get it right given enough time, but I've argued (post #64 here https://www.chess.com/forum/view/endgames/k2n-vs-kp?page=4) that it will never find the win in the 5 man position you yourself gave a couple of posts earlier. I also gave a position (first position post #81) where it gets worse the more time you give it.

These positions are immeasurably simpler than the positions which occur in the opening. 

OP asks, Can an engine determine an opening is sound from the perspective of a "perfect player",  which is a completely different question from whether it can come up with better moves than 1200-1500 immediate players.

ogpu-jd
MARattigan wrote:
drmrboss wrote:
tmkroll wrote:

There's a line in the Traxler where a lot of people on this forum could play better than Stockfish. We were debating it here a few years back. Stockfish says White is winning until it sees black has a draw by repetition, then its evaluation goes to 0. Stockfish will take that draw but people who read that forum would castle Queenside as black. Eventually Stockfish sees black is better but it takes it a very long time. There's a line in the KG that at least five or six years back Fritz was similar, idk about now. Of course engines will never play into either of these lines if you don't make it do it because their opening books have been programmed by human players who have studied and know they are bad.

Which version of Stockfish you use and how many nodes SF searched for that position?

Do you mean analysis by this crappy chess.com server stockfish? In fact if SF search only a few hundreds nodes per move, her strength will be like 1200, but a few hundred million nodes per move will make her like 3500.

 

Show me the position, and I will analyse in 3 mins and show you how strong  stockfish is. ( Let me see whether SF really played bad)

Try your SF out on this position. It's a well known win for White, but my SF can't play it for toffee.


 As mentioned in a previous thread my SF evaluates the following position, which you yourself posted, as +6.40 no matter how long I leave it running, whereas Black has a very easy draw.

This one even gets +7.34.

 

Throw in an extra piece and it does no better. It evaluates this win for White at 0.16 at depth 30 both before and after it blows it on its second move.

In fact endgames tend to get more complicated the more men there are on the board. The maximum length forced mates with perfect play (no 50 move rule) are something like 28, 43, 127, 262 and 594 for 3,4,5,6 and 7 men respectively.

SF appears to play and evaluate 3 man positions perfectly (if you take +ve, 0 and -ve evaluations to mean wins draws and losses) but it already starts going awry with both evaluation and play with 4 pieces (it can't play KBNK accurately). With 5 or 6 pieces it starts losing half points.

Can you really believe that in spite of that, when it gets up to a 32 piece endgame, it starts to give accurate evaluations?

And what do the evaluations mean anyway - there's nothing in my SF documentation that tells me. Positions after all are either won for one side or drawn; there's nothing in between.  

I know that im about 3 years late but i guess it could be interesting to see how modern SF15 would do with the presented problem: 

For the first given picture: if we assume the FEN to be  {7k/N7/p3N1K1/8/8/8/8/8 w - - 0 1} SF finds Mate in 4 in <5s; if we assume the FEN to be {7k/N7/p3N1K1/8/8/8/8/8 b - - 0 1} SF finds mate in 4 in <1s w/ a intel 9600k

MaetsNori
HolyCrusader5 wrote:

Can an engine determine an opening is sound from the perspective of a "perfect player" or do they lack the intuition and long term planning to do so? I am asking purely out of curiosity.

Today's top grandmasters often disregard the top engine moves, for various reasons.

Sometimes, the time engine moves simply lead to forced draws.

Often, the less accurate engine moves lead to better (more practical) winning chances, since they keep things more complex, less familiar, and/or less clear.

Caruana once said that he sometimes chooses slightly "worse" opening lines, simply because they may keep things unclear, and can give the opponent more chances to go wrong.

Conclusion: the top engine assessments, in the opening, are not always the best way to play.

athlblue

Also, as preparation, even if the engine refutes lets say one move because there is only one possible answer, and all other options are bad, you can play that move as a surprise because human is not engine. So you have to search beyond computer analysis for good opening preparation.

MARattigan
ogpu-jd wrote:
MARattigan wrote:
drmrboss wrote:
tmkroll wrote:

There's a line in the Traxler where a lot of people on this forum could play better than Stockfish. We were debating it here a few years back. Stockfish says White is winning until it sees black has a draw by repetition, then its evaluation goes to 0. Stockfish will take that draw but people who read that forum would castle Queenside as black. Eventually Stockfish sees black is better but it takes it a very long time. There's a line in the KG that at least five or six years back Fritz was similar, idk about now. Of course engines will never play into either of these lines if you don't make it do it because their opening books have been programmed by human players who have studied and know they are bad.

Which version of Stockfish you use and how many nodes SF searched for that position?

Do you mean analysis by this crappy chess.com server stockfish? In fact if SF search only a few hundreds nodes per move, her strength will be like 1200, but a few hundred million nodes per move will make her like 3500.

 

Show me the position, and I will analyse in 3 mins and show you how strong  stockfish is. ( Let me see whether SF really played bad)

Try your SF out on this position. It's a well known win for White, but my SF can't play it for toffee.


 As mentioned in a previous thread my SF evaluates the following position, which you yourself posted, as +6.40 no matter how long I leave it running, whereas Black has a very easy draw.

This one even gets +7.34.

 

Throw in an extra piece and it does no better. It evaluates this win for White at 0.16 at depth 30 both before and after it blows it on its second move.

In fact endgames tend to get more complicated the more men there are on the board. The maximum length forced mates with perfect play (no 50 move rule) are something like 28, 43, 127, 262 and 594 for 3,4,5,6 and 7 men respectively.

SF appears to play and evaluate 3 man positions perfectly (if you take +ve, 0 and -ve evaluations to mean wins draws and losses) but it already starts going awry with both evaluation and play with 4 pieces (it can't play KBNK accurately). With 5 or 6 pieces it starts losing half points.

Can you really believe that in spite of that, when it gets up to a 32 piece endgame, it starts to give accurate evaluations?

And what do the evaluations mean anyway - there's nothing in my SF documentation that tells me. Positions after all are either won for one side or drawn; there's nothing in between.  

I know that im about 3 years late but i guess it could be interesting to see how modern SF15 would do with the presented problem: 

For the first given picture: if we assume the FEN to be  {7k/N7/p3N1K1/8/8/8/8/8 w - - 0 1} SF finds Mate in 4 in <5s; if we assume the FEN to be {7k/N7/p3N1K1/8/8/8/8/8 b - - 0 1} SF finds mate in 4 in <1s w/ a intel 9600k

The FEN for the first picture is 8/8/8/8/8/1K1N3p/7N/k7 w - - 0 1 (i.e. none of the above). You can see the FEN by clicking in on the two fingers icon below the diagram and then on "PGN".

In either case White can mate the black king in the corner he occupies in four moves so long as the pawn doesn't both promote and have a further move. In your diagrams this is possible because Black needs at least five moves to promote the pawn. In the original he needs only two moves after the pawn is unblocked so with White to play it isn't possible. (With Black to play he can incorporate a check en route to mate if the black king doesn't move so with that alteration it would again become possible.) 

The upshot is that White must extract the black king from his corner and force mate on the h file, which makes it mate in 44.  This is well outside the capability of any version of Stockfish without tablebase access with practical resources. The maximum depth mate by White any of the SF versions I have can manage with this material varies by a few moves according to version and black pawn location but doesn't exceed 36 moves with the pawn on h3 (compared with an average depth for such mates in the Nalimov tablebase of almost exactly 58 moves).

By illustration here is SF15 attempting the original mate on one core of a Pentium  J3710  @ 1.60GHz with a hash table size of 2GB and 20 minutes on its clock.

 It draws in 3 instead of mating in 44.

The resources are in fact something of a red herring. I had at one time the last version of Rybka with the 'e' suffix in its version number. That was tailored for specific endgames without using tablebases. It was at least fifteen years ago and running on an ancient IBM PC with a specification in Mhz it would have had no problem with the position I posted.

You can't assume all is uniform progress. E.g. here it is again against SF8 with exactly the same set up. (The final position is less obviously drawn, but my king is in Troitzky's drawing zone with the pawn blockaded on h3 - see the second diagram with the 'X's here.)

It draws in 6 instead of mating in 44.

It takes me twice as long to draw against SF8 as it does against SF15.  

ogpu-jd
MARattigan wrote:
ogpu-jd wrote:
MARattigan wrote:
drmrboss wrote:
tmkroll wrote:

There's a line in the Traxler where a lot of people on this forum could play better than Stockfish. We were debating it here a few years back. Stockfish says White is winning until it sees black has a draw by repetition, then its evaluation goes to 0. Stockfish will take that draw but people who read that forum would castle Queenside as black. Eventually Stockfish sees black is better but it takes it a very long time. There's a line in the KG that at least five or six years back Fritz was similar, idk about now. Of course engines will never play into either of these lines if you don't make it do it because their opening books have been programmed by human players who have studied and know they are bad.

Which version of Stockfish you use and how many nodes SF searched for that position?

Do you mean analysis by this crappy chess.com server stockfish? In fact if SF search only a few hundreds nodes per move, her strength will be like 1200, but a few hundred million nodes per move will make her like 3500.

 

Show me the position, and I will analyse in 3 mins and show you how strong  stockfish is. ( Let me see whether SF really played bad)

Try your SF out on this position. It's a well known win for White, but my SF can't play it for toffee.


 As mentioned in a previous thread my SF evaluates the following position, which you yourself posted, as +6.40 no matter how long I leave it running, whereas Black has a very easy draw.

This one even gets +7.34.

 

Throw in an extra piece and it does no better. It evaluates this win for White at 0.16 at depth 30 both before and after it blows it on its second move.

In fact endgames tend to get more complicated the more men there are on the board. The maximum length forced mates with perfect play (no 50 move rule) are something like 28, 43, 127, 262 and 594 for 3,4,5,6 and 7 men respectively.

SF appears to play and evaluate 3 man positions perfectly (if you take +ve, 0 and -ve evaluations to mean wins draws and losses) but it already starts going awry with both evaluation and play with 4 pieces (it can't play KBNK accurately). With 5 or 6 pieces it starts losing half points.

Can you really believe that in spite of that, when it gets up to a 32 piece endgame, it starts to give accurate evaluations?

And what do the evaluations mean anyway - there's nothing in my SF documentation that tells me. Positions after all are either won for one side or drawn; there's nothing in between.  

I know that im about 3 years late but i guess it could be interesting to see how modern SF15 would do with the presented problem: 

For the first given picture: if we assume the FEN to be  {7k/N7/p3N1K1/8/8/8/8/8 w - - 0 1} SF finds Mate in 4 in <5s; if we assume the FEN to be {7k/N7/p3N1K1/8/8/8/8/8 b - - 0 1} SF finds mate in 4 in <1s w/ a intel 9600k

The FEN for the first picture is 8/8/8/8/8/1K1N3p/7N/k7 w - - 0 1 (i.e. none of the above). You can see the FEN by clicking in on the two fingers icon below the diagram and then on "PGN*.

In either case White can mate the black king in the corner he occupies in four moves so long as the pawn doesn't both promote and have a further move. In your diagrams this is possible because Black needs at least five moves to promote the pawn. In the original he needs only two moves after the pawn is unblocked so with White to play it isn't possible. (With Black to play he can incorporate a check en route to mate if the black king doesn't move so with that alteration it would again become possible.) 

The upshot is that White must extract the black king from his corner and force mate on the h file, which makes it mate in 44.  This is well outside the capability of any version of Stockfish without tablebase access with practical resources. The maximum depth mate by White any of the SF versions I have can manage with this material varies by a few moves according to version and black pawn location but doesn't exceed 36 moves (compared with an average depth for such mates in the Nalimov tablebase of almost exactly 58 moves).

By illustration here is SF15 attempting the original mate on one core of a Pentium  J3710  @ 1.60GHz with a hash table size of 2GB and 20 minutes on its clock.

 It draws in 3 instead of mating in 44.

The resources are in fact something of a red herring. I had at one time the last version of Rybka with the 'e' suffix in its version number. That was tailored for specific endgames without using tablebases. It was at least fifteen years ago and running on an ancient IBM PC with a specification in Mhz it would have had no problem with the position I posted.

You can't assume all is uniform progress. E.g. here it is again against SF8 with exactly the same set up. (The final position is less obviously drawn, but my king is in Troitzky's drawing zone with the pawn blockaded on h3 - see the second position with the 'X's here.)

It takes me twice as long to draw against SF8 compared with SF15.  

if i look for {https://syzygy-tables.info/?fen=8/8/8/8/8/1K1N3p/7N/k7_w_-_-_0_1} its DTZ 84 and not DTM 44, regardless of that: an interesting topic to talk about + i learned something about the "sharing" mechanic