Fixing New Analysis

Sort:
flashlight002

@dallin hi there!

Have you been able to fix up all the accuracy issues with the new analysis system? I saw it was down for a short period today and the old system was up, but now the new system is back again. I haven't had time to check for accuracy yet (going to check a game now) but wanted to find out from you were your team was on this.

Bad_Dobby_Fischer
m_connors wrote:

The new analysis changes its mind!!! Do the daily puzzle and after it is solved, open the analysis. Follow the "best" suggested line on self-analysis for several moves, then restart and analyze the moves. Stockfish reevaluates some of those "best" moves as inaccuracies or good, then shows the "best" move. It has done this for the last two daily puzzles. Now that is odd! How can its previously suggested best move now become a lesser move?! 

just so you know the old one was that way too

that's why engines dont always play the same opening

flashlight002

@Bad_Dobby_Fischer what I do is wait a while until the engine has settled down and "made up its mind". Initially one sees the engine changing its mind quite a bit as it analyses deeper and deeper. The issue with the new engine has been the inaccurracies after the engine has done its thing and proposed its "best moves" variation. On re analysis it was proved that the new engine was suggesting blunders and bad moves! 

Chess.com said they were fixing this all, but I have not had any feedback. I don't know to what depths the initial scan is done...this is done server side. So we have no control over it. Dillan from chess.com said it would be at depth=22. I don't know what the new settings will be after they have fixed things their end. Because the initial scan is done server side there are no settings like thread (how many physical cpu cores one can assign to the engine) or hash (memory allocation). Another website that offers analysis actually has a setting that allows one to run the analysis locally if one wants to (using the hardware of our computers) and it then allows one to choose the number of threads - which can vastly improve the strength and accuracy of the engine if one assigns say 4 cores instead of 1(the usual default setting). But chess.com's new system does not have this flexibility. The only setting we do have is "engine time limit", but that will only work on additional analysis done by the user after the initial scan. It also won't work on the "retry" section. But I see that chess.com has increased the depth analysis on the retry move variations from depth=18 to depth=20, so one hopes that in this section accuracy will improve, as we were seeing some quite bad engine move suggestions (blunders and bad moves suggested in the move variation for a correct retry move).

Dillan also said there were other issues affecting the accuracy outputs that needed to be addressed this week, but he didn't give an indication of what those issues were. He said he needed this week to sort them out. I have asked in a post to him today if they are finished fixing.

fschmitz422

The new analysis board has tons of bugs of all sorts, and, most importantly, the analysis itself is totally flawed. See for instance attached screenshot. This is quite embarassing for a leading chess website.

fschmitz422

Simon Williams (the Ginger_GM) two days ago, when trying to use the analysis board on Twitch: "What kind of computer is this, Fritz 1.1 ?!"

flashlight002

@fschmitz422 I agree that chess.com developers have to fix this engine. Have you shared that screenshot with them? The more data and evidence they get the better!!

dallin

I don't follow your assessment that the analysis is wrong there, @fschmitz422. The move you chose was mate in 24. Stockfish saw a faster mate in 20, and this was classified as an inaccuracy as a result. Thank you for posting that, though. One of the best things you can do to help us hone this product is post positions from games where you feel the system got it wrong.

From a progress standpoint, all analysis from the initial Full Game Analysis to Retry Mistakes to Feedback on alternate lines has been made consistent at depth=20. We are using Stockfish 10 in all situations for Web. Stockfish 10 at a depth of 20 is equivalent to about 2900 (beyond human capability.) Thankfully, Stockfish 10 very accurate at depth 20... which is a big deal. It means we all have to wait less for accurate analysis.

We recognize that depth 20 may not be deep enough for some of our users, and are working on adding additional depths for those who want to go deeper. However, we found the experience (wait time) in testing deeper depths to be unacceptable the way our single-move analysis is currently configured. The current system has to wait for the analysis to complete (go all the way to depth 24) before it can give feedback. It's not fun to wait 30 seconds for an engine to go all the way to depth 24 to get an answer that would have been accurate at depth 20. We are in the early stages of reconfiguring this so that we can make depth selection a possibility, but not a pain.

We've also added additional measures to improve consistency. All analysis for Retry Mistakes is now done on the server, for example, as opposed to the client (your CPU.) But exploring lines in Analysis is still done on the client... and that does open up the possibility for Stockfish to contradict itself at times. We are looking into ways to help with this as well (including configuring single or multi thread processing, as @flashlight002 suggested.)

dallin

I appreciate your skepticism, @PawnstormPossie.

Here is a study done on the impact of search depth on playing strength. Fun to review if you are into math, but the proof you are looking for is the conclusions on page 77. http://web.ist.utl.pt/diogo.ferreira/papers/ferreira13impact.pdf

The study's impact there measures Houdini 1.5a (2011) at depth 20, but we all know Stockfish 10 is a significant upgrade form Houdini 1.5a.

Comparing human ELO to computer ELO is tough, but even Chess Tiger 14 (rated 2504 at CEGT http://www.cegt.net/40_4_Ratinglist/40_4_AllVersion/rangliste.html) back in 2001 running on a single core 850 MHz Pentium III processor scored a performance rating of 2800 when playing against humans in the Mercosur Cup and Najdorf Memorial Open.

Studies are great, but proof is in the pudding. To further your confidence here, we will run a match between Stockfish 10 at depth 20 and an open-source engine rated close to Chess Tiger 14. I will let you know the results.

dallin
PawnstormPossie wrote:

@dallin 

Thanks for this reply! I'll definitely look into the 2011 study and conclusions.

Looking forward to the match. It would be nice for a GM to analyze the game/critical positions afterward.

 

I'll ask GM Simon Williams to do an analysis, @PawnstormPossie

flashlight002

@dallin thanks for the update. Much appreciated. I look forward also to seeing how you will incorporate greater search depth, multi thread ability (if you can include it) etc as a function/option in the current architecture of the new system and GUI. 

As a new product one is bound to hit snags and issues/speed bumps but I am very happy to see the willingness of your team to sorting these out. Coding is always a work in progress. 

As you said... it's vital that when a user sees a problem with this new engine they screenshot or screen video it (I like screen videos as then the dev team really can see an issue as it unfolds) and post it here for us to see and also send it on to customer support so that the dev team has hard facts to work on. I think we are all on the same page in terms of wanting an engine we can CONSISTENTLY rely on as being really accurate. As the premier chess website in the world I would expect nothing less!!

As regards the ability of a chess engine vs human ability I am afraid the engines rule at present. Will we ever see a human with an elo of 3500? Who knows. But wont that be something to behold!! The current strength of Stockfish 10 is 3564. Fire is also close so it's going to be an interesting contest you will be hokding! For everyone's interest here is a listing of the 18 best chess engines to date.

https://www.rankred.com/chess-engines/

 

dallin

And for anyone who wants to see these engines battle it out, there is the Computer Chess Championship running right now right here:https://www.chess.com/computer-chess-championship

Speaking of battles... our test match is complete, @pawnstormposse. Stockfish 10 running at depth 20 just finished a best of 500 games at 3+1 time control vs a more modern engine that is close in rating to Chess Tiger 14 — Gaviota 0.85, rated 2501 at CGET. As stated earlier, an engine of this rating proved a 2800 performance in play against humans.

In this best of 500 games, Stockfish 10 at depth 20 won 481 to 19 vs Gaviota at max depth. Here are the PGNs for all 500 games should anyone wish to review: https://cccfiles.chess.com/archive/sf10-vs-gaviota0.85.pgn

Stockfish 10 is strong, even at depth 20.

Thank you all for your patience as we work to deliver the best analysis product you can get online. We're committed to getting this right not only because we want to make you happy, but also because this is A TON OF FUN for us. We love this! We love building it. We love improving it. We love learning from it ourselves.

chaos-theory

The new analysis with arrows and colours and symbols everywhere is just awful. Please at least make an option to turn it off. It is not helpful to anyone but a novice and is just a result of over-engineering. I cannot use it anymore.

notmtwain
dallin wrote:

I don't follow your assessment that the analysis is wrong there, @fschmitz422. The move you chose was mate in 24. Stockfish saw a faster mate in 20, and this was classified as an inaccuracy as a result. Thank you for posting that, though. One of the best things you can do to help us hone this product is post positions from games where you feel the system got it wrong.

From a progress standpoint, all analysis from the initial Full Game Analysis to Retry Mistakes to Feedback on alternate lines has been made consistent at depth=20. We are using Stockfish 10 in all situations for Web. Stockfish 10 at a depth of 20 is equivalent to about 2900 (beyond human capability.) Thankfully, Stockfish 10 very accurate at depth 20... which is a big deal. It means we all have to wait less for accurate analysis.

We recognize that depth 20 may not be deep enough for some of our users, and are working on adding additional depths for those who want to go deeper. However, we found the experience (wait time) in testing deeper depths to be unacceptable the way our single-move analysis is currently configured. The current system has to wait for the analysis to complete (go all the way to depth 24) before it can give feedback. It's not fun to wait 30 seconds for an engine to go all the way to depth 24 to get an answer that would have been accurate at depth 20. We are in the early stages of reconfiguring this so that we can make depth selection a possibility, but not a pain.

We've also added additional measures to improve consistency. All analysis for Retry Mistakes is now done on the server, for example, as opposed to the client (your CPU.) But exploring lines in Analysis is still done on the client... and that does open up the possibility for Stockfish to contradict itself at times. We are looking into ways to help with this as well (including configuring single or multi thread processing, as @flashlight002 suggested.)

The reason it is embarrassing is that the position is an obvious mate in 5- as the lines say. The black king is locked in on the back rank. The c pawn marches down to queen.  I guess the h pawn can throw in a spite check but mate is still unavoidable.

It is not a choice between mate in 20 or 24.

Mate in 5.

A beginner can see it.

That's what is embarrassing. How can the computer miss something so obvious?

fschmitz422
dallin wrote:

I don't follow your assessment that the analysis is wrong there, @fschmitz422.

 

If you can't see why this analysis is utter rubbish, all hope is lost anyway.

Thank you for posting that, though.

 

giancz91

Dallin, I ask again, could you give us an option to use the old analysis? It can't be that hard!

I liked it much more.

fschmitz422

Dallin,

let me give you some advice, from a professional software developer's point of view:

This whole idea of synchronizing server-side analysis with client side evaluation is insane. You may be able to improve on this, but it will never really work flawlessly. Just drop it.

Let analysis be computed client-side, as before, with depth-options. And make this in-board arrows and icons really configurable. (There are config options already, but they are broken also.) And do something about the jerking of the piece movement animations (Firefox at least) .

In the meantime, switch back to the old version. It wasn't maybe fancy, but at least it worked.

And from the bucks you saved on server side cpu, go get you a drink.

Recently I'm noticing many bugs, all over the site (e.g., in TT the pieces go mute now after about 20 puzzles, and I have to start TT again then). I appreciate you're trying to improve the website, but, as the german proverb goes, don't put more on your plate than you can swallow. Quality first.

Regards

 

flashlight002

@dallin @fschmitz422 @giancz91 and @notmtwain I decided to run a test for the scenario presented by @fschmitz422 in his screenshot where the new chess.com engine chose to suggest a M20 solution over three M5 solutions as the best variation to proceed with after white had made it's move. The screenshot provided by @fschmitz422 was worrying me. I felt we had not addressed it properly.

As I understand it the 3 variations in the screenshot would have been computed on fschmitz422's local hardware while the M20 solution would have come from chess.com's servers (Dallin please confirm if I am correct in my logic). And this is where I believe the problem is highlighted! The results computed by @fschmitz422's local hardware is correct...not the result returned by your server. 

I felt I needed to prove this by running Stockfish on a local devise with another GUI to see what results were returned. The device is my phone! A Samsung Note 5 which actually has a very powerful processor with 8 cores. However I set the experiment at 1 thread (therefore only 1 core).

Here is a screenshot of my results. The results took about 3 seconds.

With black to move:

 

As you can see, even running this exercise on my note 5 phone it returns 3 solutions as mate in 5! Move 6 ran offscreen so I wrote it in there, the final pawn promotion to Queen resulting in #.  There is no solution for a Mate in 20! These 3 M5 solutions I generated match the 3 M5 solutions in @fschmitz422 screenshot that were returned for chess.com's "show lines" function.

Dallin this is bizzare. How can your engine suggest as a best move sequence one that is 20 moves long when in actual fact there are 3 variations that are 5 moves long to reach mate! These are the correct solutions.

This is therefore more proof that the engine returning results from your server is not returning results that make sense.

dallin

You are right, @fschmitz422 and @flashlight002 — the analysis there is rubbish. Even our lines come to the same conclusion near instantaneously. Thank you for pointing out such a clear example where this has gone totally wrong. We are looking into it now.

Togishere

I love the new analysis. It takes much shorter time than the last one, but if it is inaccurate that isn't good.

9thBlunder

The reason why it's much faster is because it's a lot less accurate. Can we get the old version of analysis and tactics trainer back? both are downgrades. If it wasn't for the real3d board I would've cancelled my membership as soon as the quality went to crap.