dalephilly - Chess Forums

Aug 30, 2017

0

#1

This is the engine-match analysis done by one of the guys in the Fair Play group, beginning on the 14th/August:-

This exceeds established benchmarks for 2300+ clean players here:
ChessAnalyse report:
Total Moves; 1674
Top 1 match: 907 / 54,2%
Top 2 match: 295 / 71,8%
Top 3 match: 179 / 82,5%
Doing 50 games scan now, will update this thread with new results later today. If someone could put his games on PGN Spy or even better to chess base, that would be cool, to clear any doubts.

On large sample, numbers still impressive:

LOL what ? ? ?

50 games, 15 book moves, 20 ply, 10/20 seconds, SF 7:

UNDECIDED POSITIONS
Positions: 660
T1: 179/425; 42.12% (std error 2.40)
T2: 198/308; 64.29% (std error 2.73)
T3: 194/257; 75.49% (std error 2.68)
>0 CP loss: 244/660; 36.97% (std error 1.88)
>10 CP loss: 167/660; 25.30% (std error 1.69)
>25 CP loss: 101/660; 15.30% (std error 1.40)
>50 CP loss: 44/660; 6.67% (std error 0.97)
>100 CP loss: 16/660; 2.42% (std error 0.60)
>200 CP loss: 2/660; 0.30% (std error 0.21)
>500 CP loss: 0/660; 0.00% (std error 0.00)
CP loss mean 12.03, std deviation 27.82

LOSING POSITIONS
Positions: 24
T1: 3/11; 27.27% (std error 13.43)
T2: 3/7; 42.86% (std error 18.70)
T3: 3/7; 42.86% (std error 18.70)
>0 CP loss: 12/24; 50.00% (std error 10.21)
>10 CP loss: 11/24; 45.83% (std error 10.17)
>25 CP loss: 9/24; 37.50% (std error 9.88)
>50 CP loss: 9/24; 37.50% (std error 9.88)
>100 CP loss: 6/24; 25.00% (std error 8.84)
>200 CP loss: 2/24; 8.33% (std error 5.64)
>500 CP loss: 2/24; 8.33% (std error 5.64)
CP loss mean 417.13, std deviation 1259.77

Clear (and instructive) case of selective cheater.

Lets compare some data now:

1900 to 2100 Range:
Reserved for PGN Spy

111 games

UNDECIDED POSITIONS
Positions: 4541
T1: 1046/2896; 36.12% (std error 0.89)
T2: 995/1763; 56.44% (std error 1.18)
T3: 988/1405; 70.32% (std error 1.22)

CP loss mean 24.37, std deviation 125.22

@dalephilly (2000+ rated) :

Positions: 660
T1: 179/425; 42.12% (std error 2.40)
T2: 198/308; 64.29% (std error 2.73)
T3: 194/257; 75.49% (std error 2.68)

CP loss mean 12.03, std deviation 27.82

Way above his rating range, where he sits for a while now. So, this is how "targeted" cheating looks like.

9 games against IM @Jungleman82 scoring better than IM:

@dalephilly, 9 games

UNDECIDED POSITIONS
Positions: 165
T1: 46/101; 45.54% (std error 4.96)
T2: 47/60; 78.33% (std error 5.32)
T3: 37/43; 86.05% (std error 5.28)

CP loss mean 16.96, std deviation 54.19

Compared to:

IM @jungleman82, 9 games

UNDECIDED POSITIONS
Positions: 164
T1: 38/89; 42.70% (std error 5.24)
T2: 32/50; 64.00% (std error 6.79)
T3: 27/34; 79.41% (std error 6.93)

CP loss mean 13.54, std deviation 51.01

25 plies deep analysis reveals real numbers:

Last 12 games, unfiltered analysis with PGN Spy:

White	Black	Result	Undecided positions	T1 moves	T1%	T2 moves	T2%	T3 moves	T3%
zorglub53	dalephilly	1-0	26	16	61.54%	21	80.77%	22	84.62%
dalephilly	PanMath1947	1-0	20	14	70.00%	17	85.00%	17	85.00%
dalephilly	wsharabati	0-1	14	6	42.86%	10	71.43%	12	85.71%
dalephilly	PanMath1947	1-0	39	23	58.97%	32	82.05%	34	87.18%
dalephilly	PanMath1947	1/2-1/2	19	13	68.42%	17	89.47%	17	89.47%
dalephilly	Spectrall	0-1	20	6	30.00%	17	85.00%	18	90.00%
Spectrall	dalephilly	1/2-1/2	16	10	62.50%	15	93.75%	15	93.75%
HomeIsRelative	dalephilly	0-1	28	22	78.57%	26	92.86%	27	96.43%
dalephilly	Gater-Nation	1-0	14	10	71.43%	13	92.86%	14	100.00%
Gater-Nation	dalephilly	1/2-1/2	17	14	82.35%	17	100.00%	17	100.00%
dalephilly	HomeIsRelative	1-0	34	21	61.76%	31	91.18%	34	100.00%
dalephilly	PanMath1947	1-0	25	17	68.00%	25	100.00%	25	100.00%

stephen_33

Aug 30, 2017

0

#2

There's quite a lot there to sink your teeth into!

SJFG

Aug 30, 2017

0

#3

On first look, it does look like his play was a little bit too good.

As for our finished games, I looked at most of the moves (especially more critical ones) and I didn't see see any serious red flags.

In general, I think that if we are going to catch cheaters it won't likely be by their suggestions in our VC games, unless it's obviously from an engine (in hindsight I have to say Boletus' comments and some of keighley's suggestions should have triggered red flags; now we know for the future). dalephilly's comments, for example, seemed to be from a person; it's his games that are questionable.

This, of course, is a lot of extra work, although perhaps running the PGN spy might not be so hard (I've not used it so I don't know yet).

One thing it does make me wonder (yet again) is how d-d's blitz and bullet ratings are so low. Yet in our games he has written things that I'm 100% certain only someone with actual chess understanding could write.

As for action, I think I'll plan to download the PGN spy and see how much time it'll take to use (hopefully just downloading games and then letting it run), and go from there, perhaps running member's games every so often.

stephen_33

Aug 30, 2017

0

#4

My laptop is quite a modest, 2-core one & I've only ever used ChessAnalyze on our single, finished games. I don't know how long it would take to analyse batches of 20 or more games but I've heard some of the cheat detectors say that they often leave their machines running overnight. I think 8 hours to process 20+ games is normal, even on quad core, high-spec machines.

I've collected links to a few threads on PGN Spy...

https://www.chess.com/clubs/forum/view/think-i-just-found-a-free-game-analyser?page=1

https://www.chess.com/clubs/forum/view/pgn-spy

Those topics are in the Cheating Forum & the other ones I have are from the Fair Play Board group which isn't quite so easy to get into. I'm still not sure why they let me in

Josechu

Aug 31, 2017

0

#5

First impressions only as I've only looked at this for a couple of minutes. And, as usual, I can only really comment on the procedural / statistical aspects rather than on specifically chess issues.
1) Time per move is a factor in performance: I can't see in the data anything that says what the format of the games is. You can't compare bullet or blitz (whether OTB or internet) with online correspondence games where the player has access to openings databases and videos and to the analysis board. In my correspondence games (as well as in VC) I rely very, very heavily on these tools. I'm getting a bit better (I hope) at visualisation, but I still make terrible errors along the way. Just yesterday I spent ages calculating a tactic that looked very strong. When I tried the line on the analysis board (thanks goodness I did!) it turned out that a pawn move early on was impossible because the pawn was pinned to my king! So is the comparison between dalephilly's games and the control data may not be completely fair. Even if the control data is also for correspondence games, it's impossible to tell whether dalephilly's "above rating" performance is not just down to him taking longer on each move than his opponent.
2) Circular argument: Dalephilly's playing standard is too good for someone of his rating. But his rating depends upon his playing standard. If he is cheating in order to win, why has his rating not gone up dramatically? The only answer I can think to that is that is that his opponents are cheating more than he is. Unless there is some sort of objective, cheat-free standard to measure him against, how can we know what his true playing strength is? From what I have seen of the anti-cheating group they like it if a player has an official USCF rating or equivalent, so that they can compare OTB paying strength against online playing strength, and draw conclusions from that. I agree that this is a good indicator that something is wrong (though not conclusive proof as the two things are not identical, as mentioned above). The problem is, how do you know what someone’s true playing strength is if they do not have a “real world” rating? If you are not careful you get into that circular argument that I described, and that seems to be happening here.
In my opinion, for these purposes (cheat detection in chess) statistical analysis should only be used as an indicator that something may be amiss. For a case such as dalephilly’s, I think the kind of analysis that Stephen (SJFG) is doing (looking at specific moves made and forming an opinion about whether, from the options available, they look like engine-inspired moves or not) is probably of much more value in finding cheats. I really hope that this is what chess.com does before closing somebody’s account.
I’m going to leave it at that before you both start to think that I am an apologist for cheats, because nothing could be further from the truth. My conclusion: I'm not satisfied that the anti-cheating group has a system that definitely detects cheats, though it certainly should work as a first cut to identify possible culprits.

Josechu

Aug 31, 2017

0

#6

Having said all that and having had time to look at the data, I think we have to accept that he has probably been cheating in his individual games, though it would have been good to hear what he had to say about it. (I have emailed you the both report from my cheating model).
It’s another blow for FVF, and we will need to think about what we do with regard to our ongoing games and the results of some of our completed games. Does anybody know whether other top VC teams are suffering a similar level of ‘casualties’? If it’s just us then I would be in favour of opening a discussion with the whole team about whether some sort of big gesture (resigning in our KO Cup final game? Or inviting the anti-cheating group to look at our discussions and dalephilly's contributions to them?) would be an appropriate way to establish our clean credentials. Basically, we cannot prevent cheats joining our team, but we can try to make our team a place to which cheats will not automatically be drawn.

This is all terribly disappointing!

stephen_33

Aug 31, 2017

0

#7

Joe, I can answer a few of the questions & issues that you bring up straightaway. The games analysed will be long time control/CC (i.e. Daily) & where Live games are analysed, this will be explicitly stated.

Another thing to remember is that the site works according to different criteria to the guys doing analysis with PGN Spy. Sometimes the Fair Play Board (FPB) guys will be convinced that a member is cheating but the site leaves their account open, sometimes for months. At other times, when the FPB guys disagree about the certainty of someone cheating, the site closes the account very quickly.

There's a lot of grumbling in the FPB group about that but the staff don't reveal thier own detection methods & that's that. In the case of dalephilly, the site seems to have been satisfied very quickly that he was cheating - case closed.

But we need to remember that just because someone is cheating in their personal games, it doesn't mean they were doing so in any of their VC ones. I think this is probably true of our games because none of us can recall ever having any suspicions over any of his suggestions.

I think FVF still has one of the lowest 'casualty' rates of any good calibre VC team. This is the first instance of a banned member in our group for a very long time (three years I believe). I can point you to groups, admittedly a lot larger, that suffer one or more banned members almost every week, so let's not beat ourselves up too badly. It's disappointing certainly but compared to many other groups, we're very clean.

* To give you a not untypical example of what some groups tolerate when it comes to dishonest members, The Tactical Edge comes to mind. It got so bad in the Knockout competitions I run with Danny, that I told the SA of the The Tactical Edge that they'd have to leave until they improved.

Take a look at this match & count up the number of Closed: Fair Play icons for that team & that's only for the last 12 months! That's out of a group of 147 members with an SA who's a moderator (laugh!) & supposedly dedicated to stamping out cheating.

https://www.chess.com/club/matches/the-tactical-edge/652220/games

You have to assume that some/many of those players would have represented the group in VC as well.

Josechu

Sep 1, 2017

0

#8

Thanks Stephen. It's comforting to know that we are not alone. That Tactical Edge match is incredible. It makes me feel a bit more comfortable about FVF, but not about the state of online chess in general. There are an awful lot of cheats about!

Another question, at the top of the block of data on dalephilly's 50 games, it says "15 book moves". Does that mean that the analysis automatically assumes that the first 15 moves are always book moves, and therefore excludes them? I can't believe that only 15 moves in 50 games were book moves, so that is the only other explanation I can think of. If I'm right then it's a bit of a blunt instrument, don't you think?

stephen_33

Sep 1, 2017

0

#9

And there're much worse groups than Tactical Edge! On the subject of excluding book moves, some analytical programs do that automatically but others have to be adjusted manually. I'm not sure what the person carrying out the analysis was doing there. I agree that excluding the first 15 moves uniformly across all games in the set makes little sense.

Frankly, the analysis some of those guys produce needs to be treated with care - sometimes it's filtered, sometimes not & that's not always made clear but greatly affects the results & the benchmarks to be used. But of course it's not the data that's used to decide if someone should be banned - that's down to the site's detection methods & benchmarks which they don't reveal.

Josechu

Sep 1, 2017

0

#10

It's completely understandable that the site keeps its criteria secret. We just have to hope that it's a bit more scientific than what I've seen from the FBP. It's encouraging that the site hasn't banned dd but has banned dalephilly. My instinct from looking at the FPB data is that that is probably about right. But none of it is really verifiable unless you have some comparison data on each player's playing strength in the same format (i.e. using the chess.com tools and, effectively, unlimited time) and where you know 100% that they are not cheating. Maybe there ought to be a tournament in the real world where those rules would apply, to see how much "better" people can play under those conditions. Very hard to organise though!