Ken Thompson’s 100% / 0% performance rating is flawed

Tocriv

I am very surprised with Ken Thompson’s proposed solution to the calculation of performance rating for a 100% or 0% performance and its endorsement by Albert Frank as well as those at Chessbase (see http://www.chessbase.com/newsdetail.asp?newsid=6316 and http://www.chessbase.com/newsdetail.asp?newsid=6325)

                                            

The Thompson solution is logically, mathematically and practically flawed.

 

To understand where I’m coming from, please be aware of these:

  1. performance rating as intended in Elo is about performance of a particular player with respect to other players in a particular pool of players
  2. assuming the performance of a player based on the player’s rating is incorrect (Elo table does not do this)
  3. rewarding lesser rated players less and higher rated players more is contradictory (it should be the other way round)
  4. it is more to adjustment of data to fit the table (wrong) rather than using the table to explain the data (correct)
  5. the Elo estimated probabilities are from a normal curve (mathematicians/statisticians should be aware that the normal curve is obtained from a large set of data), so a 100% or 0% performance is an extreme event in the normal curve, which when it happens, suggest a vast rating difference of a player with respect to other players in the same pool

 

Point 1.

Introducing a player’s own rating in calculating performance is wrong as the player did not play against himself and this goes against the very logic of performance rating which is: the performance of that player against other players. It is also implicitly saying that if the rating of an opponent is the same as a player, the result will be a draw every time (again wrong, which all players / organizers should know). Similarly, one wouldn’t conclude that a higher rated player would achieve a win every time when facing a lesser rated opponent. The 50% result one would expect from a player playing against another player of equal strength comes from multiple games in the long run i.e. to say, if both play for say 50 games, the result will be split into half each. It is not based on a single game as proposed by Thompson’s procedure.

 

Point 2 and 3.

This again goes back to the fact that performance rating is about the performance of a player against other players i.e. it is benchmarked against other players’ ratings. Using the example provided previously (assume that the player’s “result” against himself is included):

 

Navara’s rating: 2718

Opponents’ rating: 2303, 2401, 2479, 2489, 2419, 2518, 2480, 2718

Average rating of opponents: 2475.875

Score: 7.5/8 (+435, using the Elo table)

Performance: 2911

 

If Navara’s rating was 2400, then the new average rating of opponents would be 2436.125 and with the score of 7.5/8, the performance would be 2436 + 435 = 2871

 

This means that a player with a higher rating compared to his opponents’ average rating has just performed stronger than one with a lower rating compared to his opponents’ average rating. Shouldn’t it be a weaker player compared to his opponents that were given a higher performance rating for performing above his nominal rating? This is clearly not right and therefore Thompson’s assertion is unacceptable. Remember that this situation would not happen if the player’s result against himself is not included.

 

Point 4.

In statistics/mathematics it is just unacceptable to manipulate the data to try to fit it into something “so that the data makes sense”, which is clearly what is happening here through the introduction of the dummy player. 

 

Point 5.

Again, it should be noted that the Elo estimated probabilities table comes in a normal curve, a curve which is obtained from a large set of data. That is to say if 100 games (for example) has been played, if a player has 75% success rate then his rating performance is +193 of the average of his opponents. In all practicality, to have a 100% or 0% result from many games, say 100 games, would be almost impossible (though plausible). That is why in the Elo estimated probabilities table, when this happens, the player achieving this is rated to have performed at a “ridiculous” level, sort of an out-of-this-world level that was capable of the said result. This in no way contradicts what is stated in the Elo estimated probabilities table. The outrageous values obtained for a tournament is simply just because the sample is too small for the Elo estimated probabilities table to be actually meaningful enough. (Remember the Elo estimated probabilities table is for large data, not for just 7-9 games which is common in tournaments.)

  

Kung-Ming Tiong

University of Nottingham Malaysia Campus

panandh

Who cares?

nkekere

 Ken Thompson's solution seems to be flawed at least practically, if not mathematically.

1. The player didn’t play with himself/herself.

2. The player can never expect to draw himself/herself. The initial advantage (which is the advantage of having white) decreases with increasing ELO.(This is evident from practice) We see that at the very top level this advantage means very little. Theoretically this advantage can be made to equal Zero with a high enough ELO . But just as we approach infinity in mathematics without reaching it, this constant cant be reached. This means a player will always beat himself/herself if he/she is playing white, and always lose to himself / herself if he/she is playing black. This should have been apparent in Elo’s math if it hadn’t been biased against the higher rated player instead of the darker colored one.

Nkekere Tommy

Nigeria.

AMcHarg

I had recently heard about this proposal and as stated by the OP, for very valid reasons, it's totally wrong.

Better would be to simply state that a players TPR is >x where x = his TPR in the same pool of players with only dropping 0.5 (in the event of 100%) or

With 100% you can't know the TPR of a player because against an average pool of 1200 ratings the player could be 2300 or a supercomputer at 3150, both of which will probably score 100%.  So we would only know that it's greater than x, not the true value.

I would also state that in such a case the TPR can never be calculated to be lower than the rating of the player, so where >x is <player rating the TPR should be >=player rating

A