Fixing New Analysis

Sort:
jas0501

This is a time spent, not enough, on analysis issue and the horizon effect. The Missed Win classification is assigned after a limited amount of analysis. Below is a link to the 5 line analysis that occurs after the Kg2, showing black's best lines. You can see the 2 knight moves appear early and then move down and off of the 5 lines being shown. 

2:27 video https://www.screencast.com/t/LMz4eBacixeD

Here is the final evaluation showing the best 5 moves for Black, 4 -M7 and at least 1 -M8 with 47..Nc4 not being one of them.

The Kg2 evaluation shows -M22 after Nc4 a great distance from any of the -M7 line. It is this evaluation that concludes 47.,Nd3 is a missed win. Again due to insufficient, better said limited evaluation. 

I expect this is never going to be fully "corrected", not due to lack of effort by chess.com, but more due to the nature of the horizon effect on the evaluation. You have to stop evaluating at some point and the conclusions drawn at that point may not be on point.

 


 

Side note: The nature of the type of Missed Win classification my explain why the Retry process does not present retries in clearly won positions. As stated elsewhere I would prefer to see all retries (inaccuracies, mistakes, blunders, and misses wins), even in clearly won positions, even this type of "incorrect" missed win classification.

chrka

@PawnstormPossie Do you want to see how stable the evaluation is within the analysis interface, or are you just interested in how stable Stockfish's evaluation is at different depths in general? If it's the latter, I should have some data on that at home, ie., I've looked at how the evaluation and best move changes with depth and made comparisons to a larger depth. I'm currently away, but could dig it up when I get back if you're interested (or maybe rerun it some day, but it takes a while). (I posted some plots of how just the evaluation changes for a small sample of positions in #138.) 

chrka
PawnstormPossie wrote:

@chrka, definitely the later. Of you get a chance, I'd appreciate it.

I'd also like to see the site use this type.of info to determine when to stop and then apply the labels to moves.

 

Ok, I'll be back in a couple of days, but might rerun the analysis before that on my laptop (it's not that massive — would also be fun to compare to some other engine.).

I haven't dug too deeply into it yet; I'm doing something similar to compare agents for a different game and just thought it could be fun to take a peek at chess as well. But, IIRC, for ~1300 positions, in about 80% of the cases, the same exact move is suggested at depth 20 and depth 30. (I don't remember how well the other 20% performed, but I think it was pretty decent, even compared to depth 35.)

I'm sure Chess.com has made a ton of statistical analysis on this already; also there must be overlap with cheat detection and CAPS and whatnot.

flashlight002

@erik can I ask you 2 questions if I may? There is lots of technical talk now on all the reasons for the results being returned, which while very interesting, don't really solve anything in a concrete way. Its all theoretical discussion and hypotheses.

Q1:Are you saying you have a software patch in the works, and that it will be applied soon to improve the engine analysis architecture and output accuracy?  

Q2: There was talk about working on a feature to choose a deeper game analysis than d=20, for those members who wanted a really really accurate game analysis and who aren't too fazed if it takes a while to compute (naturally when I use terms like "accurate" I understand it is an accuracy  level within the parameters of what Stockfish can actually realistically accomplish...I know no results can ever be "perfectly accurate").

Are there plans to offer a deeper game analysis? So one would choose say "Standard" or "Deep" option buttons before a scan ran. Then I am hoping many of these anomalies, or "bad engine decisions" would not be seen in the deeper scan due to the horizon effect being ameliorated hopefully in parts of the game where a d=20 was not good enough, and a deeper scan at that point was required.

The deeper scan option could run at say d=27 or d=30. A d= 30 scan would have really analysed many millions of nodes and probabilities. 

Having this option would satisfy two basic target market users:

A- those wanting a quick but fairly accurate blunder checker would choose "Standard Scan"

B- those wanting a really really accurate game analyses and who don't mind waiting 5 to 7 or more minutes for it to run would choose the "Deep" scan option.

Let me know your thoughts.

erik

Both of those are in the works. 

flashlight002

@erik that is absolutely BRILLIANT news. happy.png Have a good weekend!

fschmitz422

Rensch and Hammer on another magic moment of the analysis board, trying to analyze the Giri/Aronian game with it:

https://www.twitch.tv/videos/469019660?t=05h03m50s

[Hammer:] "That's nonsense."

(...)

[Rensch, before trying another position, already showing signs of resignation] "I'm sure it just has some other dumb move." - And voila.

That's kind of fair though. If the users have to suffer, Chess.com might as well also have their front man make a fool out of himself, trying to promote a Chess.com software feature.

9thBlunder

fschmitz422 wrote:

Rensch and Hammer on another magic moment of the analysis board, trying to analyze the Giri/Aronian game with it:

https://www.twitch.tv/videos/469019660?t=05h03m50s

[Hammer:] "That's nonsense."

(...)

[Rensch, before trying another position, already showing signs of resignation] "I'm sure it just has some other dumb move." - And voila.

That's kind of fair though. If the users have to suffer, Chess.com might as well also have their front man make a fool out of himself, trying to promote a Chess.com software feature.

Most sites would roll back any bugs they release until they solve them. How long has this been a problem and nothing has changed?

ericthatwho

It's interesting if your opponent uses a computer you call it cheating,yet you complain when you can not

flashlight002

@beafraid3 hi there. What in the world are you talking about? This forum is about showcasing errors and bugs with the new game analysis system (same architecture is also used in the Analysis Board) and working with chess.com staff to correct them. We aren't discussing the use of engines as cheating aids here.

I have no idea who you are referring to, or what incident even. In any case cheating is not allowed to be discussed in any way on any forums.

flashlight002

@PawnstormPossie my whole issue is that I suspect chess.com may have sacrificed accuracy for speed? I could be wrong. Just based on what I have seen... although there may be additional coding and engine complexities and problems causing these bugs we are not privy to as well. Hence my suggestion to chess.com management that they create analysis tiered options (like we had in the old system). So:

  • a fast, but not very accurate blunder checker that takes less than a minute for those who just want a very rough idea of things and ain't prepared to wait 
  • a moderately accurate scan that goes to a fairly good depth e.g. d=25 and takes roughly say 5 min or so (just guessing the time here as game length and move complexities have an impact on time taken to scan). 
  • a very accurate solid scan that goes to a depth like d=30 and would take anywhere from 7 to 10 min or even longer. Again just a guesstimate based on other software I have used.  But the point is here the user is prepared to wait for the scan and isn't worried how long it takes. Here the user wants more accuracy and is prepared to wait for it.

Erik said such an idea is "in the works" so I am waiting to see what the dev team comes up with. I really really really hope they get this right!!

flashlight002

@PawnstormPossie I think your 2c worth are worth more than that happy.png. You always bring good interesting points to the table!

I agree. Give me a scan result I can TRUST. I alway have chosen the highest quality option available. That's because I personally would rather learn from data I know does not contain nonsense moves/suggestions...in other words consistently trustworthy information. 

jas0501

There is a balance that is needed for the initial report. This is a tricky thing. The nature of the game tree can vary significantly. Some game positions requiring greater depth than is "practical".. One can never fully guarantee  that some  move never will never appear as a missed win in an already winning position.

Would most requesting a game report accept a 10 minute wait,  a 5 minute wait, a variable time wait of between 3 and 10 minute? I think not.

One can always review the game and display 5 lines, unlimited and for critical positions wait the 5-15 minutes it may take to find the "best" move, In the grand scheme of things one can work around the occasional less than perfect "quick" evaluation.

flashlight002
jas0501 wrote:

Would most requesting a game report accept a 10 minute wait,  a 5 minute wait, a variable time wait of between 3 and 10 minute? I think not.

@jas0501 but that's where you are very wrong. Here is a screenshot of the "old analysis" method times and scan types available for a sample game of mine...a system I didn't see anyone complaining about really at all!....with scan times from 1min to 10 min or more depending on game length and complexity ( this method still available via the Android app):

I was prepared to wait the 10 minutes. And I am sure there were many others like me who chose the "maximum" scan. And I am sure there were many who chose the 1 to 2 min "quick" scan. I never chose the 1 min version because I was after accuracy and not a simple blunder checker. I never saw thousands of people writting in complaining about how long a game analysis took... because they had choice....choice in accuracy level and choice in the associated time that accuracy level took. Just interested in a very quick blunder check. No problems...choose the "quick" 1min scan. Interested in a really really accurate scan (within the constraints of what Stockfish can accomplish)...no problem, just choose the "maximum" scan taking 10 min.  

Bring this kind of "accuracy choice" back while still incorporating the new analysis system architecture and I am prepared to bet that many of the "horizon effect" issues or wrong choices by the engine will not be much of an issue anymore. 

FiddlerCrabSeason
flashlight002 wrote:

...I was prepared to wait the 10 minutes. And I am sure there were many others like me who chose the "maximum" scan..."

 

Ditto.

9thBlunder

mgt3 wrote:

flashlight002 wrote:

...I was prepared to wait the 10 minutes. And I am sure there were many others like me who chose the "maximum" scan..."

 

Ditto.

Word up! I trust the mobile analysis way more. Waiting ten minutes is not a problem.

jas0501

The current settings provide unlimited though I'm not sure of the ramifications of this setting.

FiddlerCrabSeason
jas0501 wrote:

The current settings provide unlimited though I'm not sure of the ramifications of this setting.

 

There is no choice regarding the speed/accuracy of the initial analysis (ie, game report).

- M

jonnie303

There's something wrong with the game analysis today. Take a look at this game, analysed on basic level.  After Black's 14th move it says Black is winning (-1.36), whereas in fact White has mate in 2. 

https://www.chess.com/analysis/game/live/3964702702?tab=report

jas0501
jonnie303 wrote:

There's something wrong with the game analysis today. Take a look at this game, analysed on basic level.  After Black's 14th move it says Black is winning (-1.36), whereas in fact White has mate in 2. 

https://www.chess.com/analysis/game/live/3964702702?tab=report

Not sure how you got that. Here is my eval, setting = 5 sec:

Here is the 5 line eval before the move, unlimited