Fixing New Analysis

Sort:
flashlight002

@desklamp12000 it is also quite possible the dev team may be working behind the scenes on the engine which could have generated an error. Just taking a stab here. What was the error? Did you get a code? 

dallin

Thanks for your efforts here, all. We have identified the cause of the problem with accuracy as part of the wrapper we are using for Stockfish 10. This issue is isolated to Stockfish 10. We will be moving back to Stockfish 9 as soon as possible.I will notify this thread once we have stockfish 9 back in place..

@flashlight002 & @fschmitz422 — a big thanks from our Analysis team for your examples, thoughts, and persistence. You guys have been a great help here. 

giancz91

Dallin, it seems you're deliberately ignoring me, I ask again: could you give us an option to use the old analysis? Could you concentrate your efforts upon that and, just when the old analysis is back, work on the new analysis? Thanks.

giancz91

You are losing a lot of credibility, Chess.com. I really don't get how this problem isn't on top of your priorities.

I'm an old member, I love Chess.com very much and I'll never leave it or unsubscribe my premium membership, and most of old members are probably like me, but about new ones, what do you believe they will think when they'll try your new analysis? I'll tell you what they will think: "This site is a joke!".

dallin

I wish I could give you that option, @giancz91. Unfortunately I cannot do so for technical reasons. I am sorry. We have to push forward and fix what is not working for you. Analysis is our top priority, and we are striving to deliver what you expect from Analysis.

As you can see from the direction this thread has taken, we are working on the accuracy and consistency issues as quickly as we can. Discontent has also been expressed with the UI, visual feedback, move list formatting, and speed. We are also working to improve in these areas.

Fratsenmaker
fschmitz422 wrote:

If you can't see why this analysis is utter rubbish, all hope is lost anyway.

Whaha lol, even I can mate black blindfolded in 6 moves or so.

dallin

Analysis is now back to using Stockfish 9 while we work out the issues with Stockfish 10. If you see any issues with accuracy now that we are using Stockfish 9, please do bring them to our attention.

flashlight002

@dallin thank you for keeping us up to date as you go about fixing the engine accuracy. Game analysis is a major feature of the site and I know you and your team want to deliver a reliable and accurate product we all can trust.  And it's only a pleasure helping you and your dev team to get to the root of the problems and to fix them happy.png . I'll continue to be of assistance where I can, and when I spot a problem or issue. Can I please ask for a bit of clarification (as I am not a computer programmer):

  1. Will the rollback to Stockfish 9 be a temporary solution affording you more time to rewrite necessary code that is causing you problems so that the most up to date version of Stockfish can be employed (in other words Stockfish 10)
  • 2. Can you explain a little bit more about what you mean by "We have identified the cause of the problem with accuracy as part of the wrapper we are using for Stockfish 10". On a very basic level I know a wrapper holds a subroutine. Are you saying that the wrapper in your coding that calls up the Stockfish 10 program has an issue of some sort? In other words a wrapper your team wrote to integrate Stockfish 10 into the new system architecture? Am I correct in how I am understanding this on a very basic level? Since this is all open source surely you will be able to fix the issue for Stockfish 10 given time and necessary discussions/assistance from the Stockfish development team? 
dallin

The rollback to Stockfish 9 will be temporary. We have a working build using the latest Stockfish 10 release already, but it will require some validation next week before we can consider release.

iscukatchess

Lmao I play the kings gambit most of the time so I can't really trust the analysis too much for anything but tactics. Every engine hates the King's Gambit, but very few HUMANS actually know how to play against it. IMO

fschmitz422
dallin wrote:

Thanks for your efforts here, all. We have identified the cause of the problem with accuracy as part of the wrapper we are using for Stockfish 10.

 

Good to hear you're making progress on this, and maybe my verdict about the idea of the server-side analysis being insane was a bit to harsh at least. Given the fact that native code will easily outperform js sandbox code by a factor of 1000 or sthg., it maybe isn't such a bad idea at all. Only, what is crucial here is that the server-side analysis has to be deep enough, so that the client-side evaluation of the lines converges to the server side results, and will not be more accurate within any reasonable span of time. Else the server-side analysis will be considered to be untrustworthy; and the whole thing to be flawy.

Regards

flashlight002

@dallin thanks very much for the update! happy.png. Great to hear you are making such fast progress! I didn't see your post explaining that you were working on the Stockfish 10 version.

@fschmitz422 I think you hit the nail on the head with your observation "what is crucial here is that the server-side analysis has to be deep enough, so that the client-side evaluation of the lines converges to the server side results....". I am REALLY hoping the new build Dallin's team has ready for next week, will meet these criteria so that we see don't get conflicting accuracy levels between the different parts of the system anymore! By the way @fschmitz422 you said you are a programmer...was my understanding of a wrapper correct? 

 

flashlight002

Unfortunately  @SpiderUnicorn  there isn't a Lucas chess for Android or iOS or mobile platforms.   

fschmitz422
flashlight002 wrote:

By the way @fschmitz422 you said you are a programmer...was my understanding of a wrapper correct? 

 

Basically, yes. The term "wrapper" is commonly used to describe some thin layer of software that is used to integrate some piece/module of software into some environment, typically on the application level.

flashlight002

@SpiderUnicorn we are really not expecting too much.

chess.com have acknowledged that the new engine has accuracy issues and some programmatical issues....and it is therefore not only related to speed.

If you scroll up to earlier posts you will see the communications from the VP of Product Development for chess.com - Dallin ( earlier his user name was ignoble) who explains some of the issues there are.

We also have hard evidence of really bad results created by the engine. Irrefutable evidence. Some examples have been posted on this forum thread as screenshots. Some examples of bad errors have been sent via support to the dev department. I have personally experienced cases where the engine has suggested move variations that contain multiple blunders and bad moves as part of the suggested moves in the variation. That, I am sure you will agree, an engine should not do. 

So there is a problem and chess.com has acknowledged this and are working to fix it. Yes, sometimes you may get the engine suggesting variations that are fine (especially if the variation was from the extra lines one gets if you have checked the "show lines" function, as these computations are  being done on the client side (i.e.your device or laptop) so the depth will be greater and hence greater accuracy. But the initial scan and the retry move variations are server generated...and they have shown major problems. 

We are not expecting "too much from it" at all. The previous engine system did not have these problems. Please look higher up in this thread at the example where the engine suggested a Mate in 20 variation as the correct answer when in fact the correct answer is mate in 5. I am sure you will agree that one cannot accept that as being acceptable performance! I ran the exact board position on 3 other programs running Stockfish 10 and in under 3 seconds they had computed the correct answer of mate in 5....not mate in 20. So one cannot say that because it does it faster it will be less accurate.

So....to sum up. There is an accuracy problem that many of us are not happy with as they are really bad mistakes in the engine predictions. Chess.com has acknowledged it and is actively fixing it so that the engine performs accurately CONSISTENTLY for all of us. 

Don't get me wrong @SpiderUnicorn, I think the new engine system and all it's histograms and extra reports and functions are great! But the engine must be CONSISTENTLY accurate. 

fschmitz422
dallin wrote:

Analysis is now back to using Stockfish 9 while we work out the issues with Stockfish 10. If you see any issues with accuracy now that we are using Stockfish 9, please do bring them to our attention.

I ran a few tests, out of curiosity. Goods news is: I couldn't find blatant errors as before. For instance, the position given in the screenshot is now evaluated correctly. - But: The discrepancy between the server-side evaluation and the client-side values (taken at depth 20, with "Self analysis" unchecked so I could see the current depth) is still surprisingly high (15-30% in my samples, and none of them from endgame positions to avoid complications with tablebase issues). - Could it be that the evaluation functions (or other relevant parts) of the engines are not equivalent?

However, the more I think about it, the less I care about theses discrepancies. At the end of the day one has to get rid of the concept that there are "true"/"correct"/"accurrate" evaluations at all. I guess many users will be irritated about these discrepancies, but more won't even notice.

Regards

flashlight002

This afternoon the engine appeared to be working ok. Then......

@dallin @fschmitz422 @giancz91 what do you make of this analysis below? This analysis done this evening. The server side evaluation says mate in 16. The local evaluations (as provided by the "show lines function) show solutions for mate in 6 to 8 moves.

And it gets even more bizzare. I resigned as Black at move 32 (yes yes I was playing shockingly I know happy.png) but if you look under the words "game may have continued" there is a mate in 6 suggested by the server.

What do you make of this all? If I had not had the "show lines" function operating I would have believed the M16 evaluation suggested by the server.

Does this not show there is a real disconnect between the results being generated by the server section and the client side section (show lines function). 

 

fschmitz422
flashlight002 wrote:
(...) If I had not had the "show lines" function operating I would have believed the M16 evaluation suggested by the server.

Does this not show there is a real disconnect between the results being generated by the server section and the client side section (show lines function). 

 

 

 

It WOULD have been a mate in 16 if you had played Rd6 as the previous move. Since you played a4 instead, it is now a mate in 7 (according to the server), or a mate in 6 (according to the client side engine).  - This difference between M6 and M7 is alarming, though. At first I thought this may be because the server evaluation display referred to the position PREVOIUS to your last move (hence a 1-move-offset), but a quick cross-check revealed that elsewhere this is NOT the case. So indeed M6 vs. M7 is alarming. - And yes, there is a "real disconnect" between the server and the client side results, simply because the engines are not functionally equivalent. In theory, the client side implementation could be functionally equivalent to the server side implementation, but as my own tests revealed (see my previous post), they are (currently) not. - However, even if client and server side engines are NOT completely equivalent (like for instance SF-9 vs. SF-10), this does NOT explain M6 vs. M7, since the computation of forced mates MUST be the same for all engines, or at least one of them is broken (or some wrapper again). - About the "Game may have continued" line I don't know, and I suggest we don't take it too seriously.

flashlight002

@fschmitz422 thank you for your very insightful analysis of this situation and for setting me right in how I was reading things! Silly me for getting myself confused on such a basic thing happy.png. Yes I see what you mean exactly. Things are clearly still not right. @dallin hopefully this coming week all the issues can be ironed out. 

 

 

dallin

It looks like the difference between M6 and M7 is not due to a bug, but to the different depths of the server side analysis (20) and client side analysis (above 20 if you wait for it). Stockfish does not see the M6 line until depth 22. @flashlight002