Proposal to Develop a Complexity Metric Feature

Sort:
Amethyst-Cat

I am reposting this from Site Feedback as I have not heard back from any chess.com developers. I would like to propose a chess complexity feature to developers. Using the billions of games in the chess.com database, I believe an objective measure of chess complexity or sharpness can be developed. Such a feature has the potential to revolutionize chess and would be invaluable to any chess website. Some specific applications include generating non-tactical puzzles (imagine tactics trainer for positional chess puzzles), creating chess computers that play with human personalities, and identifying concepts that are key to improvement at any rating level.

Below are 16 positions selected from 1000 random positions from the 2019 World Cup sorted by our algorithm, with the least complex positions on the far left-column and the most complex positions on the far right-column. While we have achieved good results, our algorithm can be improved significantly with more data. Our model only trained on 25,000 games, but chess.com has billions of games, a significant portion already analyzed by users. If any chess.com developer or other developer has access to the chess.com database (perhaps through the API) and is interested in such a project, please let me know. Your help and technical expertise will be invaluable in creating these revolutionary features.

Please find the above results and more details about the algorithm in my full proposal here.

APISTOTELHS

This is a very interesting project!

MGleason

This is a very interesting project from a cheat detection perspective too, and if you post about it in the Cheating Forum (https://www.chess.com/club/cheating-forum) you might get some interesting feedback.

I haven't read it in detail yet, but if I understand correctly, you're training this based on how people at different rating levels perform.  Are you adjusting for different time controls?

Also, I would be careful about using online games.  Online games - particularly in daily chess - are subject to widely varying levels of effort, and cheating could also skew your stats.

Amethyst-Cat

#2 APISTOTELHS - Thank you so much!

 

#3 MGleason - Thank you for the feedback! From what I have investigated, the neural network can be trained on games from all rating levels, as a lot of positions difficult for lower rated players are also difficult for higher rated players. However, a lot of the really interesting applications come from factoring in rating to the model. If an experienced chess player found the similarities in positions that are hard for lower rated players but easier for higher rated players, they could identify the skills that lead to chess improvement. Similarly, the amount of time a player has on the clock or the amount of time they spend could be factored into the model to produce insights on the positions that can be played intuitively versus the positions that require time and calculation. I have also written about this in more detail in the full proposal.

Cheating is definitely a concern, but I think that if chess.com has a list of cheaters, you could remove their games from the training data. I hadn't thought of the cheat detection perspective, so thank you for pointing that out! I will check out the cheating forum and add the application to the full proposal.

MGleason
Amethyst-Cat wrote:

Cheating is definitely a concern, but I think that if chess.com has a list of cheaters, you could remove their games from the training data. I hadn't thought of the cheat detection perspective, so thank you for pointing that out! I will check out the cheating forum and add the application to the full proposal.

The issue is that there will inevitably be a small number of cheaters that have not been caught.  Most of them will not be cheating very much (which is why they have not been caught), which may help minimise the effect, but if it's important to have exclusively clean games, using online games is risky.  This is particularly the case in slow time controls and at high rating levels; cheating in fast time controls is harder to hide.

I would also use somewhat older games - 3-6 months back for live, 6-12 for daily - just to give a greater chance of having caught the vast majority of cheaters.

Better still, though, would be a large sample of OTB games.  There are some very large databases out there, some of them available for free.

Amethyst-Cat

#5 MGleason - I definitely agree cheating is a problem, and if online games are made available and used to develop such a metric, there would have to be strict filters to screen out games that may involve cheating. However, because only analyzed games can be used as input (to calculate centipawn errors for moves), online games have many significant advantages over OTB games. Most importantly, there are magnitudes more analyzed online games than there are analyzed OTB games. In addition, the vast majority of analyzed OTB games accessible online are of high rated players, meaning it would be very difficult to extend the complexity analysis to lower-rated players. It is extremely computationally expensive to analyze games (minimum stable evaluations probably require 1 sec/move). Online games also track much more data like time on clock and time spent per move, which I believe is a more useful indicator of time pressure than time control. 

MGleason

"there would have to be strict filters to screen out games that may involve cheating"

That's the difficulty.  You can identify people who cheat in a lot of games, and depending on the complexity of the game and how blatant the cheating is you can sometimes even detect it in a single game.  However, there will always be borderline cases.  A lot of these borderline cases will involve cheating, but some of them won't.  If you exclude anything borderline, you'll skew your stats one way.  If you include anything not conclusively proven as cheating you'll skew your stats the other way.

I guess the question is, how much will your system be skewed by a small percentage of cheating?

Amethyst-Cat

#7 MGleason - Thank you for the insight! The algorithm will certainly be biased based on how good the chess.com cheating algorithm is. Here's my thoughts on how to remedy such a problem:

1. Train the neural network on all the games played by users not detected to be cheating. With the amount of data available to chess.com, this should give you a function f(position, elo, time, etc) -> probability of making an error. Of course, this function would only apply for people that are not detected of cheating, which is not the same as people who are not cheating. Due to the relatively low percentage of cheaters, positions that are objectively more complex should still have higher error percentages than positions are are less complex (there is unlikely to be a type of position that everyone cheats in). However, these percentages will be lower than actual. 

 

2. Use the games of detected cheaters to develop a statistical test for cheating that uses the function developed in the previous step. I am thinking of a test like a chi-squared test of homogeneity, which screens through a user's games and sums up the inconsistencies for each move (for instance, making a perfect move in a position detected to be very complex would result in a larger added value). Then a certain threshold would be created as low as possible but able to catch all the users already identified as cheating. Of course, this would also flag some users who have not been detected as cheating as possibly cheating (likely people who cheat in complex positions but do not for most of their moves).

 

3. Test to ensure that such an anti-cheating test works and very rarely flags people who are actually not cheating.

 

4. With a better method of cheat detection, our data is now less biased. This can be used to produce a better method of cheat detection, repeating the above steps. 

MGleason

Your second step is, unfortunately, completely useless.  It's very, very common for cheaters to play some games completely clean and other games partially clean.  Cheat detection is not trying to answer the question "which moves did you cheat on?", as such a level of precision is simply not possible.  The question it's going for is "have you ever cheated at levels that we can detect with sufficient confidence that we're prepared to defend it in court if you decide to file a lawsuit?"

Still, simply filtering out all games against people subsequently banned for cheating will eliminate the vast majority of cheating.  The question is how much a very small remaining percentage will skew the results.

If a position would be solved by a clean player at a particular rating level 55% of a time, undetected cheating would bump that up by a tiny percentage, and it wouldn't matter.  But if it would be solved by a clean player 0.05% of the time, now even a tiny percentage of undetected cheaters starts to matter, as that 0.05% could, percentagewise, increase fairly significantly.  So the question is, how many of those 0.05% positions are there, and how much will it matter if they're measured incorrectly?

Amethyst-Cat

#9 MGleason - Thanks again for the insights. I wasn't thinking about the legal aspect of cheat detection, and that probably severely limits the utility of neural networks in that area. While I still think that in theory you could catch more cheaters who may cheat on certain moves or in certain games using the network (for instance, if there exist complex positions where there may be a 0.05% of finding the right move, and a player cheats in a few of those positions, then it may not matter that they played 10 or so clean games in between each cheat), I don't think machine learning will hold in court because it is a black box algorithm (cannot tell you why a position is complicated). 

 

I still believe that as long as the cheating filter is sufficiently powerful (say, 95% of moves are free of cheating), the skew will be very minimal. From my preliminary tests, the highest error chance predicted in 1000 positions in the World Cup by the network was 86%. Suppose the maximum prediction is 90%, and randomly 5% of the training data is made by engines and not humans. That would still only skew a 90% prediction to an 85.5% prediction of error. Other than outputting a slightly inaccurate prediction, applications of the metric wouldn't be affected, as they mostly depend on ranking positions by complexity, which is not affected.

MGleason

If you think a small percentage of the sample size being affected by cheating would not significantly skew your metrics, then you're probably safe to use online games.  I'd steer clear of team matches and tournaments (other than tournaments with real prize money that therefore get extra scrutiny from the detection system, such as Titled Tuesday or the Pro Chess League), and I wouldn't use high-rated games at slow time controls.  The vast majority of those games will all be clean, but the percentages will be somewhat higher, which will skew your results a little more.

And I'd also use games from 3-6 months ago for live chess, and 6-12 months old for daily.  That will give more time for cheaters to be banned so you can exclude them, without going so far back that you get to the point where the detection systems were not as good as they are now.

Amethyst-Cat

#11 MGleason - Thank you so much for all the helpful suggestions! As of now, I am trying to contact a developer from chess.com about implementing this project. I have created a smaller prototype in the paper, but I currently lack the technical expertise and computing power (likely hundreds of GB needed to store games and lots of CPU power to train the model) to implement the full project. Of course, if nobody decides to pick this up I will attempt this in the future, but I am hoping a developer will be interested and take up this project. If you know a way to contact chess.com staff, please let me know. Thanks again for all your time and help, I really appreciate the feedback.

MGleason

You could try sending a message to @jdcannon.  If he's not the right person to contact about it, he would know who is.

The concept is also one that would be intriguing to the cheat detection enthusiasts in the Cheating Forum, which is a big part of why it got my attention.  If you can estimate how often a player at a particular rating level is likely to solve a particular position, and someone solves too many difficult positions, you can start to put a number on exactly what the odds are of that happening legitimately.

Ken Regan has an algorithm for that which is not based on Neural Networks and machine learning.  I don't remember exactly what he does or if he has the exact formulas online anywhere.  I would assume chess.com's internal systems are doing something somewhat similar.  But a machine leaning/neural network approach is very interesting.

Amethyst-Cat

Thank you so much for all your time and help, I really appreciate your insights! I will look into the current cheating algorithms and contact the cheating forum and jdcannon. 

MGleason

Chess.com's cheat detection algorithms are secret, for obvious reasons.  I don't remember exactly how much detail on specific formulas Ken Regan has on his site, but he's got quite a bit of interesting material: https://cse.buffalo.edu/faculty/regan/chess/