Fixing New Analysis

Sort:
Avatar of giancz91
Toire ha scritto:

Why in the world would anyone bother to analyze a position like this?

A raw beginner would know the game is won for black...in 10, or 11 moves?...who cares?

That's clearly not the point, @Toire. The point is that the engine doesn't work.

Avatar of fschmitz422

Something else: The server side engine evaluates a move of mine (in the example below: 13. Qc2) as inaccurate, and suggests an optimal line. If I inspect this line, already the second opponent move (14. ..Nf4) is marked as a mistake. This is weird, and I observe this kind of behaviour quite frequently.

Don't get me wrong, I'm not saying that Qc2 is better that Qe2, but the suggested line appears to be broken.

 

 

And yet something else: Later in the game I played two "missed wins", which of course again weren't really "missed wins", but bad moves (especially 32. Rh4, with which I managed to trade my rook for the queen, but blinded by that missed a mate in two). What is weird here is that these errors do not appear in the list in the "Retry" tab. 32. Rh4 definitely should appear there, as a blunder. 36. Qf4 should either be disregared (as a blunder/mistake) completely (because it is one of these "doesn't really matter anymore" positions), OR it also should appear in the Retry tab.

Avatar of Toire
fschmitz422 wrote:

Something else: The server side engine evaluates a move of mine (in the example below: 13. Qc2) as inaccurate, and suggests an optimal line. If I inspect this line, already the second opponent move (14. ..Nf4) is marked as a mistake. This is weird, and I observe this kind of behaviour quite frequently.

Don't get me wrong, I'm not saying that Qc2 is better that Qe2, but the suggested line appears to be broken.

 

 

 

 

 

I looked at the game and can see no advantage in 13.Qe2 over Qc2; there is no way your move can be described as an inaccuracy in my opinion, it was a solid move.

That sort of analysis is confusing and unhelpful, even accepting the superficiality of the product.

Avatar of flashlight002

@fschmitz422 your examples are very illuminating as everything you are observing I have also observed.... multiple times in my game analyses. And I am still seeing them:

  1. The engine feedback system points out a move as being wrong and it isn't.
  2. The engine feedback system suggests a new move and a follow on variation...and when one inspects the moves in this variation the system feedback section re evaluates its own suggested move/s a a problem/s (mistake, blunder etc). This kind of feedback continues to be exhibited.  
  3. Not all relevant blunders are shown in the retry section
  4. The feedback system is very sluggish returning a result..even though it is being run client side

So looking at the above evidence @fschmitz422 we can safely deduce that the new analysis system is still very broken. Results are erratic.

The real question is...what is being done to fix this all? I am positive these issues are being experienced by many...but they may just not be looking deeper and questioning what the feedback system shows them and trusting the outputs blindly. And that's dangerous, as many of us beginner and intermediate players look to the engines as a learning tool. And the teacher should be consistently giving the right lessons and "advice". But as your latest evidence shows.... it's still broken and returning nonsense!

@dallin and @nmrugg:

  1. what is the status in respect of fixing the above? What are you guys doing about it?
  2. What is causing the above issues?
  3. And most importantly how long will it take to be fixed? 
Avatar of 9thBlunder

Who cares about the status of their work if all they have to do is revert us back to the old analysis mode while they take their sweet time fixing a faulty product that we're paying for.

One of the things I miss was the reliability of the CAPS rating. Because the engine is so weak, the engine is telling me that I'm playing at GM strength even though I'm really playing like a patzer. Thank goodness for Fritz!

Avatar of flashlight002

@9thBlunder I do agree with you, the sensible thing would have been to just remove the product (apologize for the inconvenience, but explain the reasons why) when they initially saw issues and revert back to the old system while they fixed the new system. They did this with the puzzles section when they launched the combined puxxle rush, puzzle learning and rated puzzle section and then took it offline a short while later for a few weeks while they fixed it. So I agree they could have done this. After all, I agree with you, we are paying to have features that work. But chess.com don't seem intent on going this way for some reason. Hence, since I don't see any improvements in the analysis system I have asked, I feel, the reasonable question as to when we can see it all fixed. 

Avatar of dallin

Thanks guys. I am sorry this is going on so long that it feels we are not progressing. We are, but our path forward does not include any engine other than Stockfish 10. Many of the issues pointed out are not so much a factor of the wrapper that we have for the engine, but are related to depths, processing, and the feedback we give on the information the engine provides based on our configuration. That sounds pretty simple to tweak and get right. It's not, otherwise you would all be happy right now.

So why don't we just roll back to the old analysis? The old analysis had the exact same problems, but it was MUCH, MUCH harder to see. With the old computer analysis, we gave you a static look at the analysis generated by the engine. Computer analysis was on a separate tab from the engine lines, only existing in Self Analysis. You were never able to easily compare the two and easily see inconsistencies.

We also never tried to give you feedback on your own variations. That is also new to the new Analysis system. I wish that were perfect. It's not, and we're working to improve it.

The reason you are seeing inconsistencies is because we have added more ways to analyze positions than the static feedback you had before. I don't want to go back to that, personally. What I want to do is get this as solid as we can so that inconsistencies that can be avoided are avoided, and that anything that is wrong that can be fixed is fixed.

We are trying our best to tweak and improve this. We have several developers who are dedicated to Analysis that are working to make it something you not only trust, but enjoy using.

@9thBlunder, I'd love to hear more about your issues with Accuracy score vs CAPS. We have the same engineers, analysts, and methodology producing this score than we had before, but Stockfish 10 did force us to make some tweaks. Are you finding that Accuracy is inflated over what you experienced with CAPS?

Avatar of Martin_Stahl
flashlight002 wrote:

...

1. The engine feedback system suggests a new move and a follow on variation...and when one inspects the moves in this variation the system feedback section re evaluates its own suggested move/s a a problem/s (mistake, blunder etc). This kind of feedback continues to be exhibited.  

...

 

I haven't been following this discussion but I have observed that exact issue in a standalone local analysis session with Stockfish and  SCID vs PC. It is an evaluation issue with Stockfish and not something that is fixable by the site; or rather, they would have to reanalyze a suggested line/position a second time, and then decide how to handle that analysis and the original one.

Avatar of flashlight002

Hi @Martin_Stahl happy.png the plot thickens! Or maybe it is more fishy is more apt lol. Ok on a serious note... would that account for the kinds of weirdness as seen in example posts #208 and #209? So is this all a case of Stockfish10 having some bizarre evaluation issues? I can understand different "good" or "best" alternative trees being evaluated by Stockfish, but choosing blunder moves as part of its suggested variations? How can/why would it do that? That just doesn't make sense to me. Isn't Stockfish supposed to be pruning out bad solutions? 

Avatar of flashlight002

@Martin_Stahl I forgot to add:

Does this infer there is actually a fault with Stockfish 10? Do you know if the dev community of Stockfish 10 have confirmed this problem? Because it seems natural to me then that @dallin and his team can try as hard as they will but they won't be able to fix these issues unless the folks at Stockfish fix the root issues with the engine first. Since Stockfish is such an integrated element of this system maybe the Stockfish 10 developers should take a look at the issues to help out @dallin's team improve their product outputs? Maybe this has been happening already behind the scenes?...I naturally wouldn't know.

Avatar of Martin_Stahl

I haven't done a lot of analysis with newer Stockfish versions, and I'm not an engine expert. I think the engine has some sort of horizon issue, or maybe something to do with the memory hash. It is also possible that the position is just very unclear and the brute force search can't evaluate the positions accurately.

 

Unfortunately I don't have a good example that I've kept where it has done it; I just move on from that position and maybe notate in the PGN. What I recall is that I'll change the move and after it gets to depth, it will drop the evaluation. I go back to the previous move and it will like a different line, I jump into that and it doesn't like it anymore.

 

It may be related to the 0.0 moves/chapter in the Game Changer book. Or it could be something else entirely.

Avatar of doyouacceptdraw

I've been using Stockfish 10 within Fritz 16 for quite some time, and I have never experienced any of the issues you are discussing here.

Avatar of BK201YI

My experience with the new basic analysis has been that it is very inaccurate but faster than the old one. On the other hand, I only get the full analysis every couple of days and usually it pops up when I don't want it to, when analyzing some random game, not even mine. It would be nice if staff could do something about this and let us choose which games to use the full analysis on or at least if we could get a warning that we are about to use the full analysis so we can stop and use it on the games that actually matter. 

Avatar of flashlight002

@dallin or @nmrugg: a question for either of you. (If you are willing to explain happy.png)

A friend was discussing with me that the new d=20 full game analysis only takes about 3s or so to complete vs the old full scan that used to take about 5 to 8 min to complete on the site. How do you manage to perform a scan to such a depth so quickly? All the programs we have (that are also using Stockfish10) usually take a good 5 to 8 min or more to do a scan to d=20 to 23. So I was wondering how chess.com now manages to do it in a few seconds? What technology enables you to accomplish this? I would be very interested to understand how you have managed to get it to finish so fast. 

Avatar of Rasta_Jay

#228 I think they do it on their servers,. works faster.

Avatar of notmtwain

https://www.chess.com/forum/view/site-feedback/is-accuracy-a-joke

I agree that results like that make CAPS / accuracy seem like a joke.

Avatar of flashlight002

@notmtwain @drmrboss at first glance it does seem quite a surprising result. But with all things stats the devil is in the details as well as the weightings of the variables that constitute a caps score. I dont have a clue how caps is statistically calculated, and without understanding how caps is actually calculated its hard to mathematically verify or refute the score.  However I am wondering if the length of the game...which was statistically quite long (86 moves) and the fact that white scored a 66.3% best moves score has something to do with it? He also had a high no of good moves. If I include excellent and good moves with best moves to arrive at a best + excellent + good moves% of total moves then the figure is 85.4% of the total 86 moves.  Pretty good going. So am wondering if this has swayed the calculations, notwithstanding the high no of blunders as an absolute value, but which account for 7.87% of total moves made by White. If the weighting in the caps formula is highish for % of best and excellent move quality as a % of total no of moves made in the game then maybe this is why such a value is returning especially given the length of the game. 

Just taking a stab here at trying to understand how such a score could have been returned. 

Avatar of flashlight002
flashlight002 wrote:

@dallin or @nmrugg: a question for either of you. (If you are willing to explain )

A friend was discussing with me that the new d=20 full game analysis only takes about 3s or so to complete vs the old full scan that used to take about 5 to 8 min to complete on the site. How do you manage to perform a scan to such a depth so quickly? All the programs we have (that are also using Stockfish10) usually take a good 5 to 8 min or more to do a scan to d=20 to 23. So I was wondering how chess.com now manages to do it in a few seconds? What technology enables you to accomplish this? I would be very interested to understand how you have managed to get it to finish so fast. 

@dallin will you be giving away a trade secret by answering the above? 

Avatar of Martin_Stahl
flashlight002 wrote:
flashlight002 wrote:

@dallin or @nmrugg: a question for either of you. (If you are willing to explain )

A friend was discussing with me that the new d=20 full game analysis only takes about 3s or so to complete vs the old full scan that used to take about 5 to 8 min to complete on the site. How do you manage to perform a scan to such a depth so quickly? All the programs we have (that are also using Stockfish10) usually take a good 5 to 8 min or more to do a scan to d=20 to 23. So I was wondering how chess.com now manages to do it in a few seconds? What technology enables you to accomplish this? I would be very interested to understand how you have managed to get it to finish so fast. 

@dallin will you be giving away a trade secret by answering the above? 

 

It's most likely that better hardware is handling the analysis now. More CPU and memory assigned to the process. It's possible those resources are dedicated to analysis, where before they were not.

 

I don't have access to a server class system that I can use to test, but if you throw enough CPUs/cores at analysis, you should be able to get to depth quickly.

Avatar of flashlight002

@Martin_Stahl thank you for your reply. I do appreciate it happy.png. I had general idea that was basically the reasoning...but was hoping for some hard core facts (e.g. we now have x no of dedicated xyz type cores on xyz servers etc etc.) 

While I have you on the line so to speak...I see lots of discussion on the accuracy of the caps score (and the irony is not lost on me that we are debating the accuracy of an accuracy score happy.png). Is there a white paper or anything published that explains how caps is specifically statistically calculated? One can't really have a meaningful discussion about the accuracy caps score if it's a "black box" feature. Then one is reduced to "armchair logic" debate...and statistics can be highly complex.