Upgrade to Chess.com Premium!

Elo to Glicko: Your Rating Explained

Most chess ratings calculations originate with the ideas of the Hungarian Arpad Elo (not pictured here).  A physics professor in the U.S., Elo devised a basis for calculating ratings based on simple statistical concepts.  His fundamental idea was that a player's chess skill conforms to what is called a ‘normal’ distribution.  A normal distribution is shaped roughly like the outline of a bell, as shown here.

 

                                                        Figure 1:  Bell Curve

This assumption that a given player's skill is normally distributed means that on any given day that player may perform either better or worse, but given enough games the player's level of play will be distributed normally.  As it turns out, player skills in general on chess.com are also roughly distributed in the same fashion as the bell curve above.

In this idealized distribution, the middle value on the x axis is zero, but if you plot player ratings on the x axis, you will have low scores on the left and high scores on the right with the height of the curve corresponding to the number of players having each such rating.  There is an overall average skill level which, on a perfect normal distribution, corresponds to the x-value of the highest y-value (in the middle of the bell).  There are more people whose skill clusters around that average, while there are fewer people who have lower skill levels, and of course (much to our collective envy) another small group of people who have very high levels. 

You can see the current chess.com ratings curve if you click here, and indeed you will notice that it does resemble a bell.

When you play a game, you will earn points if you win and lose points if your opponent wins.  If you draw a higher rated player, you will earn a smaller number of points.  Elo’s idea was to derive a computation based on this assumption of a normal distribution of player strengths, using the rating as a representation of strength.  

Suppose you play a number of games in a tournament.  You would be expected to defeat players with smaller ratings than yourself.  Awarding +1 for a win, -1 for a loss, and 1/2 for a draw, if you play 4 games against weaker players, 3 against stronger players, and 2 against opponents the same strength as yourself, you would be expected to accumulate 4 – 3 + 1/2 + 1/2 = 2 points.  However, suppose you actually won 5 games, and lost only 2, and still drew two games.  Your actual points would then be 5 – 2 + 1 = 4.

The basic computation to adjust your rating in Elo’s system is an equation of the form:

New Rating = Old Rating + k(actual points – expected points), where ‘k’ is some constant number, e.g. 32. 

In our example, if your old rating was 1500, then your new rating would be computed as follows:

1500 + (32 (4 actual points – 2 expected points)) = 1500 + 64 = 1564.

The US Chess Federation (USCF) adopted essentially this formula in 1960 and FIDE adopted it in 1970. 

However, this is not the system used by either organization today, nor is it exactly the system used by chess.com.  In the 1980’s a bright young Statistics major at Princeton University had begun to study chess ratings, and wrote his senior thesis on the topic.  After speaking to the USCF President about his work, he was invited to join the USCF ratings committee, later becoming its chairman, a post which he holds to this day.

 Mark Glickman (pictured above) was this young student’s name, and today he is referred to as Professor Glickman by his own students at Boston University.  Glickman wrote his Harvard doctoral dissertation on what he viewed as deficiencies with the Elo ratings system, and devised a replacement, which he dubbed the “Glicko” system, in what I can only regard as a humorous tribute to his predecessor Professor Elo.  (I love clever people.)

It is the Glicko system that chess.com uses to calculate your rating.

One of Glickman’s innovations was to recognize that your rating is only an estimation of your true strength, and that there is uncertainty regarding your rating.  This uncertainty is represented by what has been dubbed the Rating Deviation.  This is merely chess talk for what a statistician calls the Standard Deviation, but it is a number that represents this uncertainty.  The larger the number, the more uncertainty surrounding your rating.

In a normal distribution, the average value along the x axis plus-or-minus 2 such ratings deviations gives an interval within which there is 95% confidence that your true strength lies.  If you don't know or don't care about statistics, then just regard this is a religious axiom and accept it on faith.

If you refer again to Figure 1 above, you'll see the 95% confidence interval between the +2 and -2 standard deviations.

 Another innovation of Glickman’s was in his observation that a player’s rating is actually less reliable as a measure of true strength if that player has not played any games after some period of time.  Suppose your rating is 1301 (the current average for chess.com members).  That rating was computed from your games against others.  It is not your true strength, which can never be truly known except perhaps by the Deity, but even Kasparov probably doesn’t know it.  Your rating is only an estimate of your true strength.   And what if you haven’t played any rated games in the past 6 months?  Do we trust your 1301 rating as much as the same rating by another player who has played 20 games in the past 3 days? 

 Glickman thinks not, so he built a time factor into his equations that allows for a decay in your Rating Deviation after the passage of time.  That is, after a period of time has passed your Rating Deviation will take on a larger value, representing the fact that we are less certain about your rating accuracy than we were when you were playing regularly.

Yet a third innovation in the Glicko system is that the equations to recompute your rating depend not only upon your own rating and rating deviation, but they also depend upon your opponents’ ratings and deviations.  For this reason, when you gain 31 points, your opponent may lose either more or fewer than 31 points, depending upon your respective ratings and rating deviations.

I will not reprint the Glick equations here because they are much more complex than the Elo equation above, but for the mathematically curious an overview that includes the equations can be found here.

For the true math geeks out there, you can read Glickman’s full technical article that was published in 1999 in the journal Applied Statistics by clicking hereImprovements to Glicko can be found in the Glicko-2 system

Finally, I will point you to Erik’s own article on the Glicko system used here on chess.com.  If you don't already know Erik, he is Mr. Chess.com.  I have tried to supplement, rather than duplicate Erik’s description.

I hope you enjoyed this brief overview of how ratings are determined, and that the next time you peek to see how many points you stand to gain or lose when you begin your new game you will appreciate the work that went into providing you the answer.  As you strive to become a better player and person, just remember to choose your move carefully, in chess as in life.

 [Postscript:  I would like to thank Prof. Mark Glickman for correcting an inaccuracy regarding Elo's assumption on normal distributions in the original post. -KG]

Comments


  • 3 months ago

    Ranx0r0x

    The use of parameteric statistic requires that one have interval or rational scaling and the won/loss/draw tuple doesn't fall into that category.  Chess game metrics are an ordinal scale.  The temptation to use the power of parametric statistics on non-parametric data is hard to resist.  Non-parametric statistics don't detect significance or patterns as granularly or readily. 

    I think this is easier to understand in the context of an example.  If I measure a temperature of 50C and then a temperature of 100C I know that the "meaning" of 1 degree C is the same between 0 to 50 as it is from 50 to 100.  It is a standard interval.  However, 100C isn't twice as hot as 50C.  In other words, it is not a rational scale. To get that we'd have to convert it to degrees Kelvin by adding 273.15C to each.  Our 50C becomes 323.15C and our 100C becomes 373.15C.  Now we can figure out what  the ratio is between them.

    In chess a 0, 1/2, 1 scale doesn't have the expressive power of either degrees C much less of degrees K.

    Further if I look at 30 games played between 2 players and say that  player A won 20 and lost 10 does that mean he's twice as strong as the player B?  Probably not.  It is more likely that they are much closer in strength. Now what  if player A beats player C by the same 20:10 ratio.  Does that mean that player B and player C are  equal in strength? If the scale is interval or rational it would mean exactly that. If that were the case we could use parametric statistics.

    And I didn't  even get into tossing half a point into that mix.  We have to get away from using parametric statistics if we ever want to get a half way meaningful rating measurement.

    One assumption that may be very wrong is that  chess players fall into a random normal distribution.  It may very well be that the curve is skewed. Stamping a normal distribution on top of it doesn't make the actual strength of the players a normal distribution.  It only means that the performance numbers will alway be out of whack and not work correctly.

  • 5 months ago

    WishMaster_89

    Where could I see my Rd in chess.com?

  • 6 months ago

    MinimusMax

    I realize this is an old thread, but I just wanted to clear up the last comment from Zapranoth8.

     

    With the point about gaining a smaller number of points from winning against a higher rated player, the "smaller" is relative to winning against that same player, not relative to drawing a lower rated player.

  • 17 months ago

    Zapranoth8

    In the fifth paragraph, you state, "If you draw a higher rated player, you will earn a smaller number of points."

    This strikes me as counterintuitive.  Shouldn't one get more rewards for de-fanging a bigger snake?  (metaphorically speaking of course!)  I seem to remember something about ELO's method that spoke of gaining the difference between the two players times a constant (+ or - "32", say or whatever.)

  • 3 years ago

    tom26

    I know this is old and I'm being pedantic buuutttt....

    " Awarding +1 for a win, -1 for a loss, and 1/2 for a draw"

    Does that really make sense? A draw is 3/4 of the way from a loss to a win?

    Once again, I apologise for pedanticism, and thanks for the collumn, very nice :)

  • 3 years ago

    stitham

    [COMMENT DELETED]
  • 4 years ago

    ZBicyclist

    Currently the Kaggle competition to see if a bunch of forecasters can improve on the system shows the Elo benchmark in 94th place, just barely on the first screen of 3 screens of teams.

    http://kaggle.com/chess?viewtype=leaderboard

    This doesn't necessarily mean the other 93 systems are better, but it certainly provides a few candidates.

  • 4 years ago

    chessroboto

    It is time to bring this article back to light as the Glicko systems will be in the subject of discussion and scrunity again.

    Kaggle and Jeff Sonas have recently opened the contest to statisticians to improve upon the much-debated ELO system, we should see some very interesting suggestions soon.

    The first prize which is an autographed copy of Fritz 11 weighs like a retirement fund already.

  • 4 years ago

    istrebitel21

    it is true that the ratings of any players were just the estimations of the maximum strength of their actual potentials,i guess there is no available system to calculate exactly the true strength of any players,the reason is simple,chess players were humans,and not computers..,affected by so many factors outside the playing condition,if any system of ratings were exact calculation of strength,then obviously two players with the same ratings would just have a perpetual draws in their games...

  • 4 years ago

    kurtgodden

    @BenFrantzDale

    Good idea, I did change the normal curve image.  Thanks for the suggestion!

  • 4 years ago

    BenFrantzDale

    The image of the normal distribution shows up as the #1 or #2 Google Images hit for "normal distribution". However, the image itself is rather inaccurate: The top is too round, the inflection points aren't at one standard deviation, and the curvature doesn't vary continuously.

     

    For the sake of accuracy on the internet, you might consider changing that image for a more accurate one like those found on Wikipedia: http://en.wikipedia.org/wiki/Normal_distribution

  • 4 years ago

    cookie3

    i have to agree w/coach777.  how can the top ratings continue to rise?  Couldn't a player manipulate the ratings by always playing lower rated players that it is pretty well assured of defeating?  As for the ratings deviation, are players being punished for playing games in a system where ELO is used?(ie,playing games or tourneys on a rival site).  How long between games should NOT be a part of ratings calculations; you would be making calculations based on an assumption.  I understand there is no perfect system, however, accuracy is most important.  Then again, does it really matter?  If a person is playing in FIDE sponsored events, then they have probably played often in recent history; if not, does the rating really matter?  Oh well, thank you for the interesting article!  Time to go play!!!

  • 5 years ago

    erik

  • 5 years ago

    Math_magician

    i enjoyed the math portion as well as the chess...  good article

  • 5 years ago

    Dauntless07

    "That rating was computed from your games against others.  It is not your true strength, which can never be truly known except perhaps by the Deity, but even Kasparov probably doesn’t know it."

    haha, lol

  • 5 years ago

    anon166

    arpad-elo-was-a-genius.mark-glickman-is-merely-clever.there-are-many-things-screwed-up-in-chess,but-the-rating-system-is-one-of-the-worst.does-anyone-really-believe-that-there-are-many-players-of-today-who-would-be-rated-higher-than-Tal?one-of-the-most-important-reasons-for-a-rating-system-is-to-be-able-to-have-one-to-one-comparisons-between-different-countries-and-different-eras.but-fide,uscf,and-historical-ratings-are-now-wildly-different-and-getting-worse-every-day.2600-was-referred-to-as-supergrandmaster.it-once-meant-you-were-in-the-top-10-players-in-the-world.check-the-latest-fide-list.you-have-to-be-2640-to-be-in-the-top-100!!and-it-gets-higher-every-list!again,all-you-have-to-do-is-play-over-their-games-to-see-that-all-these-players-of-today-are-clearly-NOT-better-than-the-true-giants-of-the-past.

  • 6 years ago

    pprmnt13

    Hah, DOE. I guess it really does have a use in the real world. =]

    Anyways, it's nice to finally understand the ratings.


  • 6 years ago

    kurtgodden

    Joe, in answer to your question about calculating the std dev.

    Just do this:

    For each observed data value, take the square of its difference from the mean of all data.  Now sum those up and divide by n-1 where n is the number of observations.  Now take the square root of that whole thing.

    See?  Nothing could be simpler.  And alternative, albeit manual technique is:

    use your scientific calculator.   :-)


  • 6 years ago

    CuzImTNT

    very interesting. i was wondering about the small loss to huge gains or vive versa when you play different ratings. this article brings back DOE. i never quite got how to calculate standard deviation (not that i want too know =)
Back to Top

Post your reply: