What is CAPS good for?

Sort:
Chessplayer2093

I have analysed some games from the Daily Chess Championship and have gotten very weird results: 

https://www.chess.com/daily/game/214305824

In this game for example I got a CAPS of over 99% but for example 26. ... Rb1 was marked as a blunder and my average difference from the engine's top choice was 0.23 which is slightly higher than what I have in otb games. Despite that my quality of play is over 99.31% which indicates almost perfect play. As another example, in the PCL on Wednesday Wesley So played with white against Grant Xu and managed to hold a bad position. But the CAPS gave So 97% and Xu 94% indicating that So played better although Stockfish gives -1 as evaluation of the final position. I heard some people say that it's not accurate for one game but rather for a great number many games, for example all of Magnus Carlsen's games as chess.com did in this article:

https://www.chess.com/article/view/better-than-ratings-chess-com-s-new-caps-system.

So my question is: If CAPS is so inaccurate in measuring the quality of moves in one game how can it be accurate for many games? And if it isn't: What is CAPS good for?

notmtwain

Caps was a decent try at innovation but many other people have noted the problems since it came out 2 years ago.

They haven't been touting it much.

 

Chessplayer2093

Mediocre at best... I'm just scared that chess.com uses it to catch cheaters. They will get so many false positives or false negatives. It shouldn't be used for analysing games at all but they used it for the WCC and declared Caruana won by quite some margin which is just bs.

Deranged

CAPS is pretty useless. I've had a 99.8 CAPS one game and a 50 CAPS the very next game. There's never any kind of consistency.

Rook_Handler

@chessplayer2093,

You are committing a part-to-whole fallacy. Your statement that if CAPS isn't accurate while measuring one game it can't be accurate measuring a span of games is incorrect. That is not necessarily true.

Chessplayer2093

Apparently I didn't elaborate enough on what I meant. It also doesn't work for 12 games. What is the number of games where it starts being even remotely accurate? 

And since it is inaccurate for a set of at least 12 games and definitely for one game, how does it level out its mistakes? 

IMKeto

CAPs scores allow 1000 players to think they played like a GM.

Its an ego feed, thats all.

Martin_Stahl

Pretty much like any statistics, you need a large sample to get reasonable information. A high CAPS average over a large number of games would potentially be useful. However, in some games, CAPS can be high due to lack of options and really isn't very useful (forced moves that probably should not count.) 

 

As to consistency, a lot of players are not consistent and fluctuations in scores is expected, with less as a player becomes stronger.

Chessplayer2093

Yes, many statistics need a large sample size to produce reasonable information. The reason are outliers that have a big impact on a small sample size. But there is a big difference. With these statistics (let's say average amount of points scored during an American football game) the outliers do occurr naturally. The method how you obtain that information is usually accurate and reliable. With CAPS you don't just have changes from game to game due to one's changing accuracy of moves from game to game but also a method that doesn't produce reasonable results for estimating the accuracy of a player in one game.

For example: Let's go back to the Football example. A game ends 38-29. The average is obviously 33.5. You can reliably measure that and if you take a hundred more games, you'll get a reasonable estimate for the average number of points scored in a football game. But CAPS would measure something like “50-26“ and say “Oh, the average is 38“. That's what I mean by “If it is wrong for one game, how can it be accurate for a large sample“. How are the flaws in measuring quality of play accounted for in a large sample? Because it seems that CAPS is biased in certain positions and that's why it doesn't deliver accurate results. It is not a part-to-whole fallacy @Tebow2Baker. If the method of measuring quality of play is wrong, why would it suddenly be correct for a large sample?