Chess.com Online rating system

Sort:
ichabod801

Statistical estimates based on different measurements of a subject interacting with different populations in different ways are not directly comparable. That's basic statistics.

876543Z1

ok so perhaps you have no reference material, then can we go through the points one at a time & maybe begin with different populations, how do they differ between chess.com & uscf etc.

ichabod801

The set of chess.com players is not equal to the set of USCF players.

876543Z1

crikey, is that the basis of your stats, can you elaborate please.

ichabod801

No, that's the answer to your question. Which part don't you understand?

Nytik

You know, if these threads are going to pop up every week, I'd rather chess.com used a completely different rating system. Maybe have a system where the top players are currently around 10,000. That would stop people thinking that they correlated in any way... Shame we can't change the system. Frown

TheGrobe

I don't thing that it would help -- you'd still have people looking for a conversion rate when it's just not possible to establish one that gives you any kind of reliable indication how any one person in one pool will stack up in the other.

No, unless we endeavour to teach everyone basic statistics I think this confusion will persist.

Nytik

I don't think it will eliminate every problem, but it will get rid of one- people thinking their chess.com rating is their OTB rating. (I'm sure you've seen at least one case of this on the forums...)

We can't teach everyone basic statistics, more and more people come to the site and then they just argue with all the explanations...

ichabod801

Maybe we should draft one detailed explanation for the non-statisticians, and reference that whenever the subject comes up.

TheGrobe

It would actually be ideal, I think, if we could get Erik to update this article with a breif explanation of why they can not be compared.  That way when it's referenced, there should be little question because it's from someone in a position of authority.  If someone else writes it, it will be called into question every time it's referenced.

http://www.chess.com/article/view/chess-ratings---how-they-work

ichabod801

Okay, I redid my calculations on the distribution of chess.com ratings:

http://blog.chess.com/view/ratings-distribution-redux

The odd thing is the that the median is increasing much faster than the average. Indeed, the 20th percentile is increasing faster than the average. I don't really understand it.

876543Z1
ichabod801 wrote:

No, that's the answer to your question. Which part don't you understand?


Well no science, maths or references to support your one line thesis so I suppose that's the best effort I'm going to get from you.

>:)

876543Z1
TheGrobe wrote:

It would actually be ideal, I think, if we could get Erik to update this article with a breif explanation of why they can not be compared.  That way when it's referenced, there should be little question because it's from someone in a position of authority.  If someone else writes it, it will be called into question every time it's referenced.

http://www.chess.com/article/view/chess-ratings---how-they-work


Erik, & others have already stated that their chess.com & otb grades are comparable. I'm not saying though that Erik's views prior, now & future would hold any authority over this topic.

>:)

ichabod801
87654321 wrote:
ichabod801 wrote:

No, that's the answer to your question. Which part don't you understand?


Well no science, maths or references to support your one line thesis so I suppose that's the best effort I'm going to get from you.

>:)


 What thesis? You asked a question about one part of my conclusion. I answered you with a obvious truth. Why do I need a reference for a verifiable fact about reality? Or are you arguing that everyone who plays in the USCF plays on chess.com, and everyone on chess.com plays in the USCF?

TheGrobe

I'm sorry, but do you have a reference to that proclamation?  I'm pretty sure that the staff here understands how the ratings work and the implications of trying to compare them between pools of players so unless I see it first hand I'm going to have a hard time believing it.

Baseballfan
87654321 wrote:
TheGrobe wrote:

It would actually be ideal, I think, if we could get Erik to update this article with a breif explanation of why they can not be compared.  That way when it's referenced, there should be little question because it's from someone in a position of authority.  If someone else writes it, it will be called into question every time it's referenced.

http://www.chess.com/article/view/chess-ratings---how-they-work


Erik, & others have already stated that their chess.com & otb grades are comparable. I'm not saying though that Erik's views prior, now & future would hold any authority over this topic.

>:)


If someone's stats are comparable between two organizations (say between chess.com and USCF,), it is purely a coincidence. There is NO reliable way of saying "If my rating is xxx at this place, it will be xxx at this other place", none. The sets of people involved are different (there are a lot more masters and titled players playing regularly in the USCF pool than there are here, for example), and the math has zero meaning outside of the group of people you are using the math on. All ratings will tell you is what your relative chess strength is in relation to the other people being compared, that is all, nothing more, nothing less.

Additionally, most organizations don't even use the same rating method we use here. I think the Austrailan chess organization (sorry, I don't remember the exact name) uses Glicko like us, but most use ELO, which doesn't take into account things like how long its been since your last game (and your opponent's last game) was played. This changes how much is gained and lost by each game, and gives even less reliability when trying to go from one set of players to another.

876543Z1

One consequence of rating systems is bringing together miscellaneous sets of players from eg texas, yorkshire, tasmania etc & even dot.com land under a single recognisable umbrella brand.

Many lead a double life alternating pools dot.com to otb.

If we needed to have identical people & systems within each grouping to enable meaningful comparison then any stats class could be given by a four year old. 

>:) 

Baseballfan
87654321 wrote:

One consequence of rating systems is bringing together miscellaneous sets of players from eg texas, yorkshire, tasmania etc & even dot.com land under a single recognisable umbrella brand.

Many lead a double life alternating pools dot.com to otb.

If we needed to have identical people & systems within each grouping to enable meaningful comparison then any stats class could be given by a four year old. 

>:) 


The problem is that chess ratings aren't tangible. They aren't a measure of a "chess IQ". They aren't a measure of chess strength (except relative to your opponent). They have no real meaning. They are simply a matter of performance in relation to the other players. If ratings were a measure of something more real, than sure, one could conclude that there ought to be a way to accurately determine one's rating from one group to the next, but since there is nothing being measured by a chess rating, no such formula exists or could exist.

ichabod801
87654321 wrote:

If we needed to have identical people & systems within each grouping to enable meaningful comparison then any stats class could be given by a four year old. 


 If we were doing a simple comparison of a simple population statistic this might be true. But this is a much more complicated situation. The glicko system performs an estimation of the individual values that would be used to calculate such a population statistic. It does this by comparing pairs of individuals within the population. It doesn't use all pairs of individuals, and the comparison it makes is based on an estimation in and of itself. If that's not enough, we're using different measurements of the difference between those pairs of individuals. Furthermore, it doesn't use the best method of analyzing the comparisons, because that would be too computationally intensive. All of this makes the individuals involved and which pairs of individuals are used is very important.

Trying to compare two populations with different interactions in this way is equivalent to using two different sampling methods for pulling from two populations. It's like trying to compare the salaries of people in New York and California using a sample of rich people from New York and poor people from California. If you take into account the different measuring systems, it's like trying to compare the wealth of people in NY and CA by using the spending of rich people in NY and the salaries of poor people in CA. Considering the computational issues involved, you would be using the free calculator you got from your insurance agent.

For the details of the glicko system (with information on the pitfalls of it's applications under differing dynamics), I refer you to "Parameter Estimation in Large Paired Comparison Experiments", Glickman, Applied Statistics (44, pp. 377-394). A comparison of the sample ratings of the players in that article to their actual ratings in tournament play is also quite illuminating. Details on the different measurements being used I refer you to U.S. Chess Federation's Rules of Chess, 5th Edition (Just and Berg, ed.) and the rules of play for this site (available in the help section, especially http://support.chess.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=17)

876543Z1
ichabod801 & Baseballfan

Thanks for preparing the text and also taking the time out for the references. I found the first ref' of merit, but not entirely relevant, second and third ref's less worthy.

In absolute terms I agree with both of you.

In reasonable terms I would tend to disagree, its perhaps a question of what resulting margin of error from any comparison would confer authenticity for the man on the Clapham omnibus.

With increasing net use pools are getting tattered around the edges and sites like chess.com may by default be contributing information towards better managing the variables in a future compare task covering active players. 

>:)