Ivanov speaks out!

Sort:
jesterville

Yes, his play dropped and he lost that game also.

Polar_Bear
FirebrandX wrote:
Polar_Bear wrote:

286 or more top3 hits out of 314 moves with 83% probability happens once in 73677 tries, but this is biased a bit, because the distribution is not ideally binomial.

The only problem I have with this figure is the 314. I'd like to know who decided what 314 moves to use. The reason I take issue with this is as follows:

1. There was one game in the tournament where he clearly was not cheating (either got scared and stopped using the device, or felt a random 'human' game might throw off a little of the suspicion).

2. In the 2 drawn games and the other game he lost, he only deviated from Houdini when he ran out of time and had to just wing it at the end. This is especially true of the game he lost, where the game was a marathon of cheating right up until he ran out of time and played an instantly losing blunder when he couldn't get a move from Houdini in time.

3. Opening book. I just want to know who decided what was still a book move and what was not.

I have all these concerns, because when I reviewed the games he cheated in, EVERY move matched Houdini's top three with the exception of the obvious time-trouble moves where he simply couldn't cheat in time (and of course book moves). If you chop both those ends off, the remainder is a perfect 100% match.

That are goldendog's figures. 314 is the number of nonbook moves made by Ivanov in the tournament (he received unfair advice from computer's opening book too, of course). Goldendog used fixed time and fixed interval of depths for analysis, and downloaded computer opening book to cut-off opening phase. Although he usually checks with MegaDB in Chess.com's CC games, in this particular case he didn't, because OTB players aren't expected to memorize millions of games.

To obtain 100% match we must repeat conditions exactly move by move. We don't know actual algorithm used by Ivanov's accomplice sitting behind computer's screen. But looking at computer top3 lines changing in the process and finding the actual move appearing and disappearing there, then coming to conclusion about 100% - it is not trustworhy analysis.

Polar_Bear
FirebrandX wrote:

So tossing out the ONE human game, the percentage rises greatly.

Yes, but not THAT greatly:

Top 3 Match: 286/314 ( 91.1% )

Top 3 Match: 265/287 ( 92.3% )

Kingpatzer
FirebrandX wrote:

That's not correct, the first number shouldn't go down by nearly the same amount.

Anyway, I guess I'm going to have to do my own breakdown of the Houdini comparison, including detailing the circumstances on the moves that didn't match. This will give a more accurate picture than blanket-covering the entire tournament.

You really can't do it that way. That is post-hoc justification of data culling, and is a no-no in a scientific investigation. You can't decide that he "clearly didn't get engine support" and throw out that game when the very question at hand is "did he get engine support?"

waffllemaster

Well I don't see the problem with throwing out whole games... as long as the remaining games are a large enough sample set.

Cutting off the end of a game because "at this point he wasn't cheating anymore" should be a big no no though.

Kingpatzer
waffllemaster wrote:

Well I don't see the problem with throwing out whole games... as long as the remaining games are a large enough sample set.

Cutting off the end of a game because "at this point he wasn't cheating anymore" should be a big no no though.

The point of a statistical analysis is to show that he used assistance within the tournament. By picking which games and which moves within games to analyze after the fact based on the analysts subjective interpretation, you are biasing the sample. It's a bad protocol for an experiment. 

 Either randomly select some games from teh tournament, or analyze all the games from the tournament. But after the tournament happens, looking at the games and saying "this one looks like a good candidate to screw him, so we'll keep it" and "this one looks like it might exonorate him so we'll toss it" is not a proper investigation. 

Now, it would be fair to note the scores of individual games along with the overall result. And it would be fair game to note the top quintile matches and bottom quintile matches as part of the analysis. And doing so might well show that he did not use an engine in some games.

But picking the games to analyze based on expert judgement of the probable outcome is not a proper scientific protocol.  Post-hoc data selection is bad and biases outcomes and would never pass a peer-review.  

waffllemaster
Kingpatzer wrote:
waffllemaster wrote:

Well I don't see the problem with throwing out whole games... as long as the remaining games are a large enough sample set.

Cutting off the end of a game because "at this point he wasn't cheating anymore" should be a big no no though.

The point of a statistical analysis is to show that he used assistance within the tournament. By picking which games to analyze after the fact based on the analysts subjective interpretation after the fact, you are biasing the sample. It's a bad protocol for an experiment. 

 Either randomly select some games from teh tournament, or analyze all the games from the tournament. But after the tournament happens, looking at the games and saying "this one looks like a good candidate to screw him, so we'll keep it" and "this one looks like it might exonorate him so we'll toss it" is not a proper investigation. 

Now, it would be fair to note the scores of individual games along with the overall result. And it would be fair game to note the top quintile matches and bottom quintile matches as part of the analysis. And doing so might well show that he did not use an engine in some games.

But picking the games to analyze based on expert judgement of the probable outcome is not a proper scientific protocol.  Post-hoc data selection is bad and biases outcomes and would never pass a peer-review.  

Ok, this makes sense.

Scottrf

He's not trying to present his analysis for peer review though, an article has already been written for that.

Finding that he matches engine moves for full games, even selected ones is pretty interesting in its own right.

Kingpatzer
Scottrf wrote:

He's not trying to present his analysis for peer review though, an article has already been written for that.

Finding that he matches engine moves for full games, even selected ones is pretty interesting in its own right.

I understand he's not trying to present the analysis for peer review. He is however, trying to improve up on the analysis. Introducing confirmational bias into the experiment does not do that. 

Analyzing individual games, and the full set of individual moves surely will be enlightening. And after doing so, talking about the specific individual games and moves will surely demonstrate clear cheating -- this isn't after all a borderline case.

But the notion that post-hoc data selection improves the experiment is flat out wrong. It does not.  

Polar_Bear

@FirebrandX

The point is not to cheat during detection analysis. You must set fixed conditions and obey them, e.g. fixed time or fixed depth and ignore preliminary/late signs.

That's how human games are analyzed to obtain reference data, and compared with suspect's games.

Actually you are doing it wrong: you see the move appearing in top3 during infinite analysis and you say: "Ha! Bingo!" The move may later disappear and be left from final result or come a bit too late. That's quite amateurish and can't constitute proof, because you have no reference data and such analysis can't be reproduced.

Crazychessplaya

Moses2792796 wrote:

If you analyse individual games and it's clear that there are some where he was cheating and others where he wasn't, then this is enough.  There's no reason why all the games from that tournament have to constitute a single sample, although I do agree that you shouldn't make a statistical proof based on hand-picked games.

+1. In the eight round, the organizers turned off the live transmission. He lost miserably, as expected.

ChessisGood

Wow...I didn't really think he had cheated until I read the interview. Now, however...

Kingpatzer
Moses2792796 wrote:

If you analyse individual games and it's clear that there are some where he was cheating and others where he wasn't, then this is enough.  There's no reason why all the games from that tournament have to constitute a single sample, although I do agree that you shouldn't make a statistical proof based on hand-picked games.

Understand the side discussion about how to do the analysis is precisely about what a good protocol that scientifically establishes the fact of his cheating looks like. There is no debate about the fact of his cheating -- no one interested in doing the statistical experiment on his games thinks he might really have just gotten lucky this tournament, we know he cheated already. 

And that, btw, is what makes establishing a good protocol and following it hard. Because the truth in this case is so bloody obvious that even non-titled players can pick out the cheating versus non-cheating games by eye,  it is tempting to structure a "proof" that isn't scientifically valid -- though it would still be more than sufficient for most people anyway. 

Polar_Bear

FirebrandX,

I agree Ivanov is blatant cheater, but you should have prepared better arguments than "I have looked at his games with Houdini in UCI and he matches it nearly 100%". Conclusions based on such weak arguments only are attacked by denialists, various self-styled skeptics, cheaters, trolls, morons, ignoramuses, chatterboxes ...

johnyoudell

Juries convict on circumstantial evidence from time to time. Often enough direct evidence subsequently emerges which demonstrates that the jury was wrong.

Kingpatzer
johnyoudell wrote:

Juries convict on circumstantial evidence from time to time. Often enough direct evidence subsequently emerges which demonstrates that the jury was wrong.

The evidence here is not circumstantial. The data is clear and condeming. There is no debate on if he cheated from anyone who knows what computer cheating looks like. The only debate is about how to do and present the analysis to confirm what is already known.

goldendog
Polar_Bear wrote:
FirebrandX wrote:

So tossing out the ONE human game, the percentage rises greatly.

Yes, but not THAT greatly:

Top 3 Match: 286/314 ( 91.1% )

Top 3 Match: 265/287 ( 92.3% )

 

True enough.

I know PB is fully aware of the below, but I'll state it for the general audience:

IM Regan had these results (for all games):

"For matches to the top 3 moves of Houdini 3 the test shows 91.6%"

and a top-1 match  in MPV (using Rybka 3) of:

"69.0%"

and top-1 in MPV using Houdini 3:

"64.1%"

I got 71.0 top-1, and 91.1% top-3 for all 9 games (Houdini103a).

He excludes more book moves than I did in my analysis, and I abbreviated my usual depth in comparison to IM Regan, but the results are pretty close.

He says he also made some confidence numbers for a batch excluding games 8 and/or 9, but doesn't state any percentages, just noting that the z-scores went up (and he gives some numbers).

If analysis with a later version of Houdini consistantly gives higher numbers, this may well indicate fidelity to that particular engine.

One can cherry-pick games from a tournament but you should also analyze the whole batch and give those results too. otherwise you risk the appearance of human bias.

That Ivanov shows as a very likely cheat based on multi-engine analysis is damning enough, and gives credibility to the idea of unearthing engine users even if the analyst uses a different engine than the suspect.

That's an important takeaway from all this.

johnyoudell

The people with the greatest experience of detecting cheating at chess may well be this site and its clones. I wonder if Mr Ivanov would be branded a cheat by their methods? Although my suspicion is that a much larger sample of games would be required.

goldendog
johnyoudell wrote:

The people with the greatest experience of detecting cheating at chess may well be this site and its clones. I wonder if Mr Ivanov would be branded a cheat by their methods? Although my suspicion is that a much larger sample of games would be required.

9 games is pretty skimpy, for vanilla, classical T3.

Error bars would be "wide" with such a small sample, for those deriving SDs and odds and such.

Polar_Bear
goldendog wrote:

9 games is pretty skimpy, for vanilla, classical T3.

Error bars would be "wide" with such a small sample, for those deriving SDs and odds and such.

However 3.4 sd is pretty convicting.

In other fields (e.g. analytical chemistry), 2 sd always constitutes serious alert and 3 sd requires immediate intervention.