As a result of trying to defend the Chess Mine's record to @petitbonom after a challenge was denied, I have analysed all of the Chess Mine's games using @MGleason's tool for the purpose of identifying suspect players. I wish I had done this earlier - the results are starkly indicative of substantial use of engines over the entire 85 games. I'm not going to stay in the group, and have explained why in their forum.
Here are my posts there:
Elroch wrote:
Recently, I heard about the Chess Mine having a challenge declined on the basis that they had strong suspicions that this team's moves were influenced by engine output. I defended this claim based on the statistically inadequate evidence provided, and this led to me to analyse this group's games in the same way as I have many times for daily chess players (several of whom have been then booted for fair play violations after chess.com did much more thorough analysis.
So, what does an analysis of the entire play of this group say? Here is the summary output of @MGleason's program for identifying suspect players for reporting, using a well-known chess engine with a modest amount of time to calculate.
It is very important to note that only unclear positions after the opening are included in these analyses. Also that the T1-T2-T3 stats are only for a smaller number of positions where there are enough moves that have computer evaluations within a small difference of each other for there to a close choice. There are multiple moves with less than half a pawn difference (can be a lot less) according to the engine.
The Chess Mine, 85 games
UNDECIDED POSITIONSPositions: 884T1: 279/488; 57.17% (std error 2.24)T2: 248/288; 86.11% (std error 2.04)T3: 210/232; 90.52% (std error 1.92)=0 CP loss: 685/884; 77.49% (std error 1.40)>0 CP loss: 199/884; 22.51% (std error 1.40)>10 CP loss: 88/884; 9.95% (std error 1.01)>25 CP loss: 25/884; 2.83% (std error 0.56)>50 CP loss: 3/884; 0.34% (std error 0.20)>100 CP loss: 0/884; 0.00% (std error 0.00)>200 CP loss: 0/884; 0.00% (std error 0.00)>500 CP loss: 0/884; 0.00% (std error 0.00)CP loss mean 2.92, std deviation 8.29
LOSING POSITIONSPositions: 0
For comparison, here is the analysis of 94 recent standard time control games by a guy called Magnus Carlsen.
Carlsen, Magnus, 94 games
UNDECIDED POSITIONSPositions: 1513T1: 608/1212; 50.17% (std error 1.44)T2: 722/1046; 69.02% (std error 1.43)T3: 769/949; 81.03% (std error 1.27)=0 CP loss: 1053/1513; 69.60% (std error 1.18)>0 CP loss: 460/1513; 30.40% (std error 1.18)>10 CP loss: 269/1513; 17.78% (std error 0.98)>25 CP loss: 126/1513; 8.33% (std error 0.71)>50 CP loss: 37/1513; 2.45% (std error 0.40)>100 CP loss: 5/1513; 0.33% (std error 0.15)>200 CP loss: 1/1513; 0.07% (std error 0.07)>500 CP loss: 1/1513; 0.07% (std error 0.07)CP loss mean 6.65, std deviation 21.56
LOSING POSITIONSPositions: 20T1: 5/10; 50.00% (std error 15.81)T2: 5/9; 55.56% (std error 16.56)T3: 8/8; 100.00% (std error 0.00)=0 CP loss: 11/20; 55.00% (std error 11.12)>0 CP loss: 9/20; 45.00% (std error 11.12)>10 CP loss: 7/20; 35.00% (std error 10.67)>25 CP loss: 5/20; 25.00% (std error 9.68)>50 CP loss: 4/20; 20.00% (std error 8.94)>100 CP loss: 3/20; 15.00% (std error 7.98)>200 CP loss: 2/20; 10.00% (std error 6.71)>500 CP loss: 0/20; 0.00% (std error 0.00)CP loss mean 47.80, std deviation 99.01
Elroch wrote:
What do these stats (and their comparison) say?
Well, bottom line is that the play of this group has been a _lot_ closer to that of a pure engine in unclear positions than that of the world champion. Note that the same is also true of benchmarks based on world correspondence chess champions before the engine era, which are not more engine-like than Carlsen at OTB chess.
For examples, in unclear positions, the computer evaluation of this group's move was less than 0.03 pawns different to the engine choice. This is extraordinarily small. No human dataset comes close to this. The idea that any humans can detect positional differences twice as accurately according to an engine as the world champion is kind of ridiculous.
The team's move matches one of the top two engine choices when there are three within 0.5 pawns over 86% of the time, while Carlsen manages nearly 70% of the time. The move chosen has exactly the same evaluation as that of the engine choice 77% of the time, while Carlsen manages this 69% of the time.
Bottom line - I can't escape the conclusion that there is very high confidence in the fact that the past results of this group have been heavily influenced by engine assistance and because of that can be assumed to be to a large extent due to that engine assistance.
It is possible that this is a past problem, but it just isn't enough for me. I'm not interested in being part of a group with this history and I am sorry not to have done this analysis earlier, ideally before I considered joining in the first place.
I would point out that in principle it is possible for anyone on chess.com to check each of the moves which contribute to the relevant statistics to see who influenced the choice of that move, and thereby get a pretty clear idea which players have been the cause of the illicit assistance.