How correct is the chess.com rating

Sort:
luckisK

Wrong again, the (w-l)/(w+d+l)<=>(w+0.5d)/(w+d+l) says something more than the w/(w+d+l). 

https://www.youtube.com/watch?v=fhnrrLxQEVQ

247.625<<4739. The surely correct rating is more hell than Elo etc. I cannot be sure, but most probably I proved that Elo, Glicko and Truskill are wrong.

uri65
luckisK wrote:

Wrong again, the (w-l)/(w+d+l)<=>(w+0.5d)/(w+d+l) says something more than the w/(w+d+l). 

https://www.youtube.com/watch?v=fhnrrLxQEVQ

247.625<<4739. The correct rating is more hell. I cannot be sure, but most probably I proved that Elo, Glicko and Truskill are wrong.

The only thing you proved is that your formula gives different results from Elo/Glicko. It does not mean that your formula is correct or Elo/Glicko is incorrect.

Two more problems with your approach:

1. All games are treated equally. A game from 10 years ago has same weight as a game today. At best your formula can give an average performance since one started playing chess which makes no sense. Contrary to this Elo/Glicko show recent performance.

2. Playing opponents with much higher/lower rating can't show your progress of 50-100 Elo points, only playing with similarly rated can.

luckisK

What you guys you should answer is that I most probably proved the wrongness of Elo etc, but the (w+0.5d)/(w+d+l) is more wrong:

1.) When the number of games a player played is small, e.g. if the results of a player is just one win, then it says that he is the best player with the highest possible rating of 1/1, hahaha.

2.) That the (w+0.5d)/(w+d+l) doesn't give more weight to the more recent results (although I do not know if Glicko and Truskill does).

And "where are your modifications to the (w+0.5d)/(w+d+l) that correct these problems"? Well, the first problem is very difficult to grasp and solve, however today I found a modification that by almost intuition is close to the unknown correct Bayesian inference modification. Also, today I found a most probably correct modification to give more weight to the more recent results, that is relevantly easy.

 

Of course intuition is just bullocks. The correct asnwer is needed. For example, see the drawings of the bells that show the probability of each skill (to be the real skill)

http://www.moserware.com/2010/03/computing-your-skill.html

First, in the very small samples the bells do not apply, e.g. if the sample is 1 win in 1game. Second, in the large enough samples if you use the bells to calculate

(real skill)=(probability of skill to be a)(skill a)+(probability of skill to be b)(skill b)+and so on, then the real skill is equal to the skill of the sample. Isn't that so? Do they mean something else?

 

luckisK

Is anybody in here expert in Bayesian inference to help me find the solution?

luckisK

A player has (wins+0.5draws)/(wins+draws+losses)=(200+0.5*0)/(200+0+0) and another has (1200+0.5*0)/(1200+0+1000). This rating shows correctly that the first player is much stronger than the second. Whereas the Elo, Glicko and Truskill rates (roughly about) the first with 800+200*8-0*8=2400 and the second with 800+1200*8-1000*8=2400, if the points with which they started is 800. In essence what Elo etc count is wins-losses, whereas the correct rating is  (wins-losses)/(wins+draws+losses) which is equivalent to (wins+0.5draws)/(wins+draws+losses). They fail to show that the first player is much stronger than the second, they say that they are of the same strength! It is  WRONG and it needs to be replaced. Correct me if I am wrong, e.g.  that these are not the points Glicko gives to these 2 players (then how many points does it give), but am I wrong enough that the point I mentioned does not apply?

veryrabbit

hikaru nakamura
2736 fide
3063 chess.com

+327

:)

uri65
luckisK wrote:

A player has (wins+0.5draws)/(wins+draws+losses)=(200+0.5*0)/(200+0+0) and another has (10200+0.5*0)/(10200+0+10000). This rating shows correctly that the first is much better than the second. Whereas the Elo, Glicko and Truskill rates (roughly about) the first with 800+200*8-0*8=2400 and the second with 800+10200*8-10000*8=2400, if the points with which they started is 800. It fails to show that the first  is much better than the second, it says that they are of the same strength! It is  WRONG and it needs to be replaced. Correct me if I am wrong, e.g.  that these are not the points Glicko gives to these 2 players (then how many points does it give), but am I wrong enough that the point I mentioned does not apply?

https://en.wikipedia.org/wiki/Elo_rating_system#Theory

https://en.wikipedia.org/wiki/Glicko_rating_system#The_algorithm_of_Glicko

 

veryrabbit
goldenbeer wrote:
@veryrabbit, you should compare his blitz rating with fide blitz rating not the standard rating. They differ with around 100.

So can we say the difference is between 100-300 roughly?

luckisK

On average it is +8 points for a win and -8 points for a loss. Thus the Glicko rating is wins-losses. Whereas the correct rating is (wins-losses)/(wins+draws+losses) which is equivalent to (wins+0.5draws)/(wins+draws+losses).

uri65
luckisK wrote:

On average it is +8 points for a win and -8 points for a loss. Thus the Glicko rating is wins-losses. Whereas the correct rating is (wins-losses)/(wins+draws+losses) which is equivalent to (wins+0.5draws)/(wins+draws+losses).

Your example from post #46 800+200*8-0*8=2400 is nothing average. This can only happen if the player always plays opponents of exactly same rating.

Formula (wins+0.5draws)/(wins+draws+losses) can only be applied when everyone plays with everyone same number of games like round robin tourney, otherwise it’s nonsense.

candycanemiss

i feel like the chess.com system is wrong since when I am playing and the game ends with I losing

i am ducted -20 points why

luckisK

There is need to "everyone plays everyone", if the opponents are selected randomly (thus from all ranges of strengths), it is like everyone plays everyone when the number of games becomes large. It is the small number of games that (w+0.5d)/(w+d+l) needs to be modified, I have already explained why.

uri65
candycanemiss wrote:

i feel like the chess.com system is wrong since when I am playing and the game ends with I losing

i am ducted -20 points why

It’s correct and calculated according to Glicko formula. Why do you feel -20 is wrong?

uri65
luckisK wrote:

There is need to "everyone plays everyone", if the opponents are selected randomly (thus from all ranges of strengths), it is like everyone plays everyone when the number of games becomes large. It is the small number of games that (w+0.5d)/(w+d+l) needs to be modified, I have already explained why.

Nobody will agree to play only random opponents selected from all ranges of strengths. End of story.

Glicko works fine. Did you understand the math in Wikipedia article? You should before you criticize it.

SpeckledGrill

7

luckisK

And if you are curious whether I am happy with my present Glicko rating, I just do not know if I would have a better or worse rating with the surely correct system I propose. That's because e.g. the number of my wins draws and losses were not out of a random selection of opponents (random, thus from all ranges of strengths), but of opponents of similar Glicko rating as mine. I am unhappy with THAT, that the real strength of each player is unknown with the system used. At least, if they had chosen the opponents randomly and used Elo or etc, then the (w-l)/(w+d+l)<=>(w+0.5d)/(w+d+l) or even the (Glicko~initial rating+8*wins-8*losses)/(number of games) would show something sure. Hey, I just realised this now!: They do not need to change the rating they use, they only have to select the opponents randomly, then the surely correct rating would be there: (w-l)/(w+d+l)<=>(w+0.5d)/(w+d+l). Both mine and Glicko ratings would be there for everyone to see and perhaps be able to judge how much Glicko etc are also correct as the (w-l)/(w+d+l)<=>(w+0.5d)/(w+d+l) is. I am impressed with the checkmate I have just done to you. However, the rating I propose needs some modifications, e.g. when the results are only 1 win in 1 game, he has the best rating possible while almost certainly he is not that strong player. I have a solution but it is based on intuition, the correct solution is extremely difficult to locate, it is based on Bayesian inference and most probably it needs a matrix of past results. And e.g. I would give more weight to the more recent results.

Wiktor015
Idk about you guys but Pepecoin is gonna go up
ChessOpeningTrapsYoutube

I think what we're missing here is that the current system allows for equally strong players to play against each other. The (Wins+0.5Ddraws)/(Wins+Draws+Losses) system assumes you have played random people (so from really low to really high ratings).

But that doesn't work if you want to match up players using their strength. Picture this:

A really good player is playing other really good players, and so he wins as much as he draws and loses. So the formula would give a .5 score.
A really bad player is playing other really bad players, and so he also wins as much as he draws and loses. So the formula will again give a .5 score.

This means the above mentioned players would have the same score according to your system. So if you were to use this system to try and match up players to equally strong players, you can't. This system only works when you use random match-ups (and like mentioned the accuracy is low when someone hasn't played a lot of games).

 

kahnd4

We are all patzers who cares about ratings just have fun

luckisK

https://www.physicsforums.com/threads/elo-glicko-and-truskill-rating-systems-are-most-probably-wrong.999641/

I wrote about it here as well, and they locked the thread because "I do not admit my mistake".