hicetnunc data


This is the correspondence I had with hicetnunc following his post on chess.com thread about the action they are taking regarding cheating. I'll try to find the thread and post the link here.
https://www.chess.com/article/view/chess-com-fair-play-and-cheat-detection?page=2#comments
I notice that hicetnunc's derogatory comment has been deleted, as requested by Stephen.
Towards the end of the correspondence below, hicetnunc urges me to show the data to the USCF expert in our group, which I imagine is the other Stephen.
Quote:

I need to emphasise one thing - if we decide on balance that d-d probably is using engine assistance, then I'll remove him & no one else will be mentioned. I took the decision to admit him by myself & I'll make it clear that his expulsion is my responsibility as well.
That's in case either of you is worried about any awkwardness that might follow. Then you can delete any comments you wish.
Of course Laurent is wrong about me being in denial as he claims but then the guy is so wrong about all kinds of things! My problem is being able to make a rational judgement about a player whose analysed games fall within the bounds of human play, albeit a high level of play. I know from my personal experience that it's possible to achieve a good standard of play in long time control but be a much poorer OTB player.
That gives me quite a different perspective from OTB focused players like Laurent.

https://www.chess.com/member/dearg-doom
There are a couple of things that trouble me though. He has 13 ongoing games at the moment which seems reasonable for a player whose strength is more tactical & analytical than the kind of instinctive, positional skill required for fast play. If he was playing 50 or more, I'd be worried & I can relate to that style of playing because it's the same as my own.
But then his average move time is slightly over 10 hours which seems a little short for a player that puts a lot of analysis into his games. You have to wonder as well quite why he plays shorter time controls when he doesn't seem very good at all? I avoid them because I know I'm weak under time pressure.
Is there something here that doesn't add up?

I think both of those reservations can be explained by one thing: that he probably spends his whole time playing chess! You didn't mention that he is also very quick out of the blocks when we have a move to make, which is part of the same phenomenon.
1) The 10 hours per move. If you look at his comments, he very often refers the rest of us to chess books and to YouTube videos, so he must also spend a lot of time researching etc. I bet he also has a comprehensive openings book of his own, which would enable him to make a lot of moves instantly at the start of a game. Also, the way he seems to work (judging by his contributions to FVF) he will be planning in great detail several moves ahead, so if his opponent follows the script he would make some moves very quickly. As is so often the case, the headline figure of 10 hours per move would need to be broken out before you could draw too many conclusions from it. i.e. How many moves within 1 hour, how many in 2 to 5 hours, etc... Also average time per move in the first 10 moves of a game; average time for moves 11-20 etc. Then, depending on what you find, you combine those two analyses. I spent a lot of my working life doing this kind of analysis (though in a completely different context, obviously) and I can tell you that the headline average hardly ever tells the whole story; any conclusions you may make at that level are very often undermined by what you find out when you mine the data in more detail.
2) Why does he play shorter time controls? Quite possibly because he has run out of correspondence games to work on and he is desperate to play some form of chess. Or maybe he wants to try to improve his speed chess, perhaps to help his OTB chess, if he even plays much OTB (I have no idea whether he does or not); I wouldn't be at all surprised if he doesn't.
I could go on, but I don't have much time just now and anyway I can sum up my position very quickly. If we are to expel dd it has to be based on something much more than vague suspicions. If chess.com close his account because of the evidence they have gathered about his personal games, then obviously he goes. If chess.com will not ban him (presumably because the evidence is not strong enough) then we need to have some objective standards of our own and, in my opinion, those standards need to be based on an analysis our members' contributions to our own games. Alternatively, we could contact chess.com and ask them for their take on some of our players. If chess.com's view is that one of our players is very close to being illegal then perhaps we can act on that.
Must go now.
Joe

First of all I'll say that I also have the standard of innocent until proven guilty, and think that it's better to not ban all cheaters and not ban any innocent members rather than ban all cheaters and some innocent members.
Secondly, I don't have much knowledge of the T1/T2/T3 analysis. I've only skimmed some of the posts in the cheating forum. I only have so much time for chess, and I personally put the priority on studying/trying to improve rather than catching cheaters. Thus, while someone who has studied these stats a lot more might draw an immediate conclusion, it's hard for me to evaluate it.
Awhile ago I looked at d-d's games and overall concluded that I would not be surprised at all if he was banned, but ultimately could think of some scenarios where it's plausible he is not cheating. I looked at a few of his games again, and have to say that's still my overall thought.
As for his contributions to VC games, he generally has a lot of good reasoning behind his moves. However, occasionally I'm unsure. For example, in our current game his reasoning against Nh6+ seems correct, as is his knights vs bishops analysis, yet I don't know that the Ra2 move would have occurred to me. I personally wouldn't retreat the rook unless I saw a specific reason to do so. The game is still current so I cannot see how the engine rates it...
One brainstorm I have is to simply tell d-d if he could explain the difference between his ratings, and that someone outside the group said it raises suspicions. The upside of this is that we could see if his reply seems logical. The downside is that if he's innocent, it might make him have a bad feeling about FVF, and if he's guilty, he might be alerted that he's being watched and conceal his cheating.
I've not drawn any certain conclusions yet; will be thinking about it.

Thanks both of you, it's clear that we're all thinking much the same way on this subject. The one thing I agree with hicetnunc about is that any player should be expelled from our groups & thrown off the site, where it can be shown beyond reasonable doubt that they're cheating. Where I strongly disagree with him is over where the line of reasonable doubt should lie.
I think if he were in charge of the criminal justice system, there'd be a lot more hangings!?
But to put d-d's engine matching figures into some kind of context, I found this information in another group to which I belong:-
(2016 WCC) "Impressive play from both, but still far lower than many players I have reported with > 60% T1 over 600+ moves ;-)

Also this from previous years:-
"... my results from the 2014 WCC:
11 games
UNDECIDED POSITIONS
Positions: 738
T1: 266/570; 46.67% (std error 2.09)
T2: 313/446; 70.18% (std error 2.17)
T3: 308/379; 81.27% (std error 2.00)
>0 CP loss: 260/738; 35.23% (std error 1.76)
>10 CP loss: 125/738; 16.94% (std error 1.38)
>25 CP loss: 48/738; 6.50% (std error 0.91)
>50 CP loss: 13/738; 1.76% (std error 0.48)
>100 CP loss: 5/738; 0.68% (std error 0.30)
>200 CP loss: 1/738; 0.14% (std error 0.14)
>500 CP loss: 1/738; 0.14% (std error 0.14)
CP loss mean 6.83, std deviation 34.43
LOSING POSITIONS
Positions: 14
T1: 2/4; 50.00% (std error 25.00)
T2: 0/1; 0.00% (std error 0.00)
T3: 0/0
>0 CP loss: 6/14; 42.86% (std error 13.23)
>10 CP loss: 6/14; 42.86% (std error 13.23)
>25 CP loss: 6/14; 42.86% (std error 13.23)
>50 CP loss: 6/14; 42.86% (std error 13.23)
>100 CP loss: 5/14; 35.71% (std error 12.81)
>200 CP loss: 4/14; 28.57% (std error 12.07)
>500 CP loss: 1/14; 7.14% (std error 6.88)
CP loss mean 260.07, std deviation 693.95
And from 2013:
10 games
UNDECIDED POSITIONS
Positions: 648
T1: 219/458; 47.82% (std error 2.33)
T2: 240/329; 72.95% (std error 2.45)
T3: 231/276; 83.70% (std error 2.22)
>0 CP loss: 205/648; 31.64% (std error 1.83)
>10 CP loss: 109/648; 16.82% (std error 1.47)
>25 CP loss: 34/648; 5.25% (std error 0.88)
>50 CP loss: 9/648; 1.39% (std error 0.46)
>100 CP loss: 2/648; 0.31% (std error 0.22)
>200 CP loss: 1/648; 0.15% (std error 0.15)
>500 CP loss: 1/648; 0.15% (std error 0.15)
CP loss mean 5.85, std deviation 23.06
LOSING POSITIONS
Positions: 6
T1: 2/3; 66.67% (std error 27.22)
T2: 2/3; 66.67% (std error 27.22)
T3: 1/1; 100.00% (std error 0.00)
>0 CP loss: 2/6; 33.33% (std error 19.25)
>10 CP loss: 2/6; 33.33% (std error 19.25)
>25 CP loss: 2/6; 33.33% (std error 19.25)
>50 CP loss: 2/6; 33.33% (std error 19.25)
>100 CP loss: 2/6; 33.33% (std error 19.25)
>200 CP loss: 1/6; 16.67% (std error 15.21)
>500 CP loss: 0/6; 0.00% (std error 0.00)
CP loss mean 70.50, std deviation 112.03
Of course, sample size isn't huge with a limited number of games."
Those are the kind of 'Super-GM' stats that are usually used to provide a benchmark for unassisted/honest play.
d-d's T3 figures are certainly well within those of Carlsen/Karjakin & suggest a good human level of play:-
Positions: 746
T1: 168/472; 35.59% (std error 2.20)
T2: 184/312; 58.97% (std error 2.78)
T3: 196/277; 70.76% (std error 2.73)
My own:-
UNDECIDED POSITIONS
Positions: 758
T1: 205/551; 37.21% (std error 2.06)
T2: 243/441; 55.10% (std error 2.37)
T3: 257/395; 65.06% (std error 2.40)
Carlsen/Karjakin (2016):-
T1: 53.7%
T2: 72.9%
T3: 83.5%
Those figures seem to have quite large margins however!
On the subject of 'blunder-checking', notice the number of mistakes/blunders in moves that involve an engine-evaluated loss of more than 200 CP (centi-pawns/hundreths of a pawn). d-d has a total of 2 moves out of 746, over 200 CP (i.e. 2 pawns) loss but look at the equivalent for WC players.
A comment by one person from the same source:-
"I wouldn't consider those numbers particularly suspicious for a strong untitled player, although the lack of serious blunders is interesting and could suggest blunder-checking.
However, one wonders why a strong untitled player would be so much weaker under fast time controls.
One possible explanation is that he plays 3|0 blitz but plays at a 10|0 pace; his chess understanding would then be decent (or at least better than his blitz/bullet ratings would indicate) but his time management abominable....."

Hi SJFG and thanks for your input. It's good to know that the three of us agree about the importance of not branding people as cheats if they aren't.
My observations about cheats in sports, including in my beloved sport of Athletics (Track and Field), is that cheating destroys the sport not so much because some athletes cheat, but because cheating becomes an obsession with those that don't. In the end nobody enjoys it any more. Compare that with some sports where they don't try too hard to catch cheats, and the sport thrives. (Baseball is supposed to be a case in point, though I have no idea whether that is true.) In cycling now they run infra-red tests on the cyclists bikes to make sure that they haven't got an electric motor in the bike! And I believe they have caught some! It's truly a sad state of affairs! The cheats are to blame, of course, but for my part I find that I sleep better if I don't let it get to me too much. At the same time I don't do hero-worship of sports stars any more. Too many disappointments.
Coming back to our specific task, there has to be a distinction made between suspicion and evidence. Yes it's suspicious that someone who is good at long format chess is considerably worse at fast chess, but you cannot ban someone from all forms of chess because they happen to be relatively poor at one type of chess. My guess is that either he makes a lot more blunders in fast chess through the absence of his usual blunder checks, or that he performs his usual blunder checks and consequently loses on time. Or perhaps he just doesn't take blitz chess all that seriously and just plays for fun. All of these things are verifiable if you have the data but I don't see that hicetnunc and others are carrying out any such analysis. His use of dd's TT rating as "evidence" of cheating was laughable and I would need to know a lot more about the OTB rating before I drew any conclusions from that; it could be that he obtained that rating when he was 12 years old and hasn't played OTB since. We don't know and we can't consider these data points as valid until we do know.
Trying to find a way forward... Would chess.com share the full data with us, move by move, so that we can perform our own analysis? I note that they classify moves as "undecided" and "losing" etc., but do they distinguish between moves where a well-prepared player might still be within his opening book (or using Game Explorer to help) and moves in the middle and end games. dd seems the type to have a really comprehensive opening book.
Maybe I've done too much data analysis in my time and I've seen too many very bad examples of the art. In the end I don't trust any analysis until I have had a chance to look at the data. And usually when I look at the data I find that I have a load of questions about how the data was obtained.
Joe

BTW, in our current game and the Ra2 move. It's a bit of a boring move but I saw it as one of those where, in the absence of anything better, you try to improve your worst piece. Once or twice in my own games I have found that these turn out to have been great moves. More usually, the piece is still there at the end of the game having conspicuously failed to contribute very much

I looked at our recently finished VC game against Obsessive Chess Disorder with an engine and read all our comments, especially noting d-d's. Here are the things that seemed most questionable:
Move 5: "Their own opening play has been not quite optimum." [White was playing a standard book opening. Perhaps he was just voicing his opinion about it?]
On moves 5, 10, 11, and 13 d-d wrote some lines that he liked and/or about what would happen in case of certain moves. These were generally the computer's first choice. During the rest of the game, most of the moves/lines he suggested were liked by the engine too. However, the moves do seem like the type of moves a human could certainly come up with.
Move 13: "I've done a quick blunder/mistake check. a and b look fine (to me, at least). One issue with Bxd3 is that we could drop a pawn, if they play Nxb7 (13. ... Bxd3 14. Nxb7 Bxe2 15. Qxe2 Qb6 [or another queen move] 16. Nxd6 ). If we are to keep Bxd3 on the list, we need to assess this position: (whether we get our pawn back or have sufficient compensation)." [Not too suspicious as he did post a thinking process a few moves later and it included a blunder check, but I thought I'd post it here as his lack of blunders is somewhat concerning.]
On move 13 d-d initially liked ...Bxc5 much better than ...Bxf4 but then wrote: "I am becoming more convinced of the case for Bxf4 over Bxc5. Or at least, I now think there is little enough between the moves that I could now go for either." [The longer the engine looks at the position, the less difference it sees between the two moves.]
On move 17 d-d suggested ...b6 (top engine move, but it is logical) and first analyzed the line where White grabs a pawn (which he said looks fine for us; the engine agrees, but I think a human could evaluate this too). The thing that's somewhat questionable on move 17 is that in response to my question about what to do after ...b6 18. exd5 exd5 19. Nd4, he initially suggested Ne7 or Rc8, both of which seem to have problems that aren't hard to see. He then commented that he'd looked more and thought Ne7 and Rc8 were bad, and we'd have to play Nxd4 (engine agrees). Perhaps he just quickly commented without much thought though, and then thought more, or perhaps he initially used his human reasoning and then pulled out the engine.
In my opinion there's nothing that's obviously suspicious, but a few things that are slightly concerning. Thoughts?
BTW, while were on the subject of last game, the only other comment that seemed a bit strange to me was Traumerr wrote: "I would suggest playing 23...Ng4 (threatening mate and an uncomfortable pin on e3)." [It's clearly a fork, not a pin. At 1900 he should know that, although it's possible to write the wrong word.]
One last note: On move 22 d-d had already suggested that in response to 23. Qc2, Ng4 would win, so it really wasn't traumerr who saw the move first.

Mmm. Good work Stephen. More grounds for suspicion, certainly, but still nothing that I think would count as definitive evidence. This is really a horrible position to be in! There must come a point where the level of suspicion becomes such that you would rather do without the player in the team. But if chess.com won't ban him on the basis of his individual games, it is really tough for us to ban him based on the less tangible, less easily quantifiable evidence of his contributions to VC games.
I'm pretty sure that Traumerr knows the difference between a pin and a fork. He must have just made a mistake. And remember English is not his first language (though his English is pretty good.) As for who spotted the knight move first, I thought I remembered looking at it before, so maybe it was in response to dd's comment on the previous move. Just when I was giving myself credit for having spotted it on my own
BTW, When I wrote that about Traumerr having seen the winning move first, I wasn't trying to single him out for praise, just to include him in the general congratulations, because he had written a strange note saying "Congratulations to you all", or something along those lines.
I think we should keep dd's contributions to our games under close surveillance. Maybe look at other games of ours he has played in. There ought to be some way we can compare engine matchup rates in our games with dd to games before he joined the group. But the problem is we had other cheats in the group back then. Yuck!
Stephen JFG. Is your analysis of the recent game in a form that it could be published in the main FVF forum?

Joe, no, I just put the game into the engine and I'd go from one move to another as I read through the archive.
Re English not being Traumerr's first language: Doh! I hadn't thought about that.

Made me realise that I don't myself know a lot of chess terms in Spanish, even though it was my Spanish grandfather that taught me the basics when I was young. I looked up the Spanish for pin and fork (chess terms). A pin (in chess) is un clavo which is a nail. A fork (in chess) apparently is una horquilla which is a hairpin (also a pitchfork or a garden fork and the fork on a bicycle). So maybe the confusion was because the word for a fork in Spanish is a word that means a type of pin! Isn't language wonderful!

Stephen, useful information - thanks for that. Frankly, I don't expect anything conclusive in this case & we can only continue to watch & wait. If he is dishonest in VC then providing a completely relaxed environment may tempt him into revealing himself?
Just as possibly, he's an extremely careful player, which explains his very low blunder count & we're suspecting an entirely honest & very conscientious member of our team - I just can't decide.

This is in confidence but just to illustrate how even Laurent's judgement is affected when he's dealing with someone he knows & likes, this piece of analysis appeared today (I've left it anonymised):-
Positions: 562
T1: 324/562; 57.65% (std error 2.08)
T2: 494/562; 87.90% (std error 1.38)
T3: 529/562; 94.13% (std error 0.99)
Admittedly that's unfiltered so the higher baselines apply but even allowing for that, the T3 figure is beyond anything a super-GM could ever achieve. Beyond all question an engine user. Normally Laurent would post an emphatic '100% cheater' & that would be the end of it for him but instead he posted:-
"Well, he was my chess teacher when I was a student, so if he is cheating, I will be very disappointed :-("
(My highlighting in red) Yet with our player, where the evidence isn't anything like as strong, he's practically made up his mind that d-d is a cheat. I don't like it when people use double standards.
At least when I challenged him he had the grace to admit that the guy above wasn't clean & when the filtered analysis came in, that he was a cheat, no doubt.
It seems he's allowed to give someone the benefit of the doubt but we shouldn't.

Another group I belong to, the Fair Play Board. It's where a lot of the cheat detectors carry out their work. It tries not to draw too much attention to itself, so that's why I may post some material 'in confidence'.

I've just written to dearg-doom to ask him about the huge discrepancy between his Daily & Blitz/Bullet ratings. The reason I don't play Blitz or Bullet is because I'm so poor at very short time control, so it's puzzling to me why d-d seems to persist, despite being poor at them?
Daily: 2119, Blitz: 1320, Bullet: 729
This is his reply:-
"...... Ratings in daily chess are notoriously much higher than OTB ratings, so not a reliable guide to playing strength. I would actually see myself as a 1700 player (much higher than my blitz rating but much lower than my daily chess rating- by about equal amounts). That figure is based on OTB ratings, the only ones I take seriously. I have two of these due to issues (now about to end) between the Yorkshire Chess Association and the ECF. My ECF grading since I joined 18 months ago has been a bit higher (until this year it was based on fewer games, these were on a longer time control which suit me more, and there is a small difference in methodology) – it is now 131 (down from 135 – which I’ll need to update).
To access these gradings go to http://www.chessnuts.org.uk/ny5/ and search for Johnstone. You’ll be able to see that my OTB rating improved dramatically in 2016-17 (up from 101 to 121), resulting in me being the most improved player in the entire Yorkshire Chess Association (which covers three counties). Moreover, it has continued to rise this season – and its now 129. You’ll be able to see that I’ve beaten and drawn against a few players with grades in the 140s. On the top left of the screen there is a link to my ECF grade (now 131). My performance in the recent Hull Chess Congress (my first), where I came 8th out of 28 in the intermediate section can be seen at http://www.ecfgrading.org.uk/new/menu.php?PlayerCode=296421B&file=player#
This progress reflects the work I’ve put into my game over the last 2-3 years (as the YCA site makes clear, I only returned to playing chess a few years ago, and its in the last 2-3 years that I’ve really started studying it).
As regards Blitz: its clearly not my strength (in club blitz nights I usually come last) and I only really play it lightly and to test out new/unfamiliar openings & ideas). I treat daily differently, using it as a learning tool. I’ll often spend half an hour analysing one move, in a process similar to that which I use on vote chess (where, distinctively, I always verbalise the explanations of moves I propose or reject).
I can see why people might ask. So, to be quite categorical I never have and never will breach the rules. Apart from the ethics of the matter, I couldn’t see the point. But equally, I’d need it confirmed that there is confidence in my integrity as a player amongst team members if I am to continue to contribute.
Gerry"
The first part of that seems reasonable enough but he becomes quite defensive towards the end. I'll have to ask around to find out if an ECF rating around 130 is comparable to his c.c Daily rating.
Other things apart, I'm still slightly mystified why a player like him plays Blitz & Bullet, at which he admits to being weak, in rated games. My rapid playing is poor too but I wouldn't dream of playing rated games until I'd managed to improve. That's a question mark in my mind.
More broadly, is it reasonable for any player to perform so well in Daily, unassisted but to perform so badly in rapid time control games?