Elo to Glicko: Your Rating Explained

Elo to Glicko: Your Rating Explained

May 11, 2008, 11:59 AM 84,027 Reads 22 Comments

Most chess ratings calculations originate with the ideas of the Hungarian Arpad Elo (not pictured here).  A physics professor in the U.S., Elo devised a basis for calculating ratings based on simple statistical concepts.  His fundamental idea was that a player's chess skill conforms to what is called a ‘normal’ distribution.  A normal distribution is shaped roughly like the outline of a bell, as shown here.


                                                        Figure 1:  Bell Curve

This assumption that a given player's skill is normally distributed means that on any given day that player may perform either better or worse, but given enough games the player's level of play will be distributed normally.  As it turns out, player skills in general on chess.com are also roughly distributed in the same fashion as the bell curve above.

In this idealized distribution, the middle value on the x axis is zero, but if you plot player ratings on the x axis, you will have low scores on the left and high scores on the right with the height of the curve corresponding to the number of players having each such rating.  There is an overall average skill level which, on a perfect normal distribution, corresponds to the x-value of the highest y-value (in the middle of the bell).  There are more people whose skill clusters around that average, while there are fewer people who have lower skill levels, and of course (much to our collective envy) another small group of people who have very high levels. 

You can see the current chess.com ratings curve if you click here, and indeed you will notice that it does resemble a bell.

When you play a game, you will earn points if you win and lose points if your opponent wins.  If you draw a higher rated player, you will earn a smaller number of points.  Elo’s idea was to derive a computation based on this assumption of a normal distribution of player strengths, using the rating as a representation of strength.  

Suppose you play a number of games in a tournament.  You would be expected to defeat players with smaller ratings than yourself.  Awarding +1 for a win, -1 for a loss, and 1/2 for a draw, if you play 4 games against weaker players, 3 against stronger players, and 2 against opponents the same strength as yourself, you would be expected to accumulate 4 – 3 + 1/2 + 1/2 = 2 points.  However, suppose you actually won 5 games, and lost only 2, and still drew two games.  Your actual points would then be 5 – 2 + 1 = 4.

The basic computation to adjust your rating in Elo’s system is an equation of the form:

New Rating = Old Rating + k(actual points – expected points), where ‘k’ is some constant number, e.g. 32. 

In our example, if your old rating was 1500, then your new rating would be computed as follows:

1500 + (32 (4 actual points – 2 expected points)) = 1500 + 64 = 1564.

The US Chess Federation (USCF) adopted essentially this formula in 1960 and FIDE adopted it in 1970. 

However, this is not the system used by either organization today, nor is it exactly the system used by chess.com.  In the 1980’s a bright young Statistics major at Princeton University had begun to study chess ratings, and wrote his senior thesis on the topic.  After speaking to the USCF President about his work, he was invited to join the USCF ratings committee, later becoming its chairman, a post which he holds to this day.

 Mark Glickman (pictured above) was this young student’s name, and today he is referred to as Professor Glickman by his own students at Boston University.  Glickman wrote his Harvard doctoral dissertation on what he viewed as deficiencies with the Elo ratings system, and devised a replacement, which he dubbed the “Glicko” system, in what I can only regard as a humorous tribute to his predecessor Professor Elo.  (I love clever people.)

It is the Glicko system that chess.com uses to calculate your rating.

One of Glickman’s innovations was to recognize that your rating is only an estimation of your true strength, and that there is uncertainty regarding your rating.  This uncertainty is represented by what has been dubbed the Rating Deviation.  This is merely chess talk for what a statistician calls the Standard Deviation, but it is a number that represents this uncertainty.  The larger the number, the more uncertainty surrounding your rating.

In a normal distribution, the average value along the x axis plus-or-minus 2 such ratings deviations gives an interval within which there is 95% confidence that your true strength lies.  If you don't know or don't care about statistics, then just regard this is a religious axiom and accept it on faith.

If you refer again to Figure 1 above, you'll see the 95% confidence interval between the +2 and -2 standard deviations.

 Another innovation of Glickman’s was in his observation that a player’s rating is actually less reliable as a measure of true strength if that player has not played any games after some period of time.  Suppose your rating is 1301 (the current average for chess.com members).  That rating was computed from your games against others.  It is not your true strength, which can never be truly known except perhaps by the Deity, but even Kasparov probably doesn’t know it.  Your rating is only an estimate of your true strength.   And what if you haven’t played any rated games in the past 6 months?  Do we trust your 1301 rating as much as the same rating by another player who has played 20 games in the past 3 days? 

 Glickman thinks not, so he built a time factor into his equations that allows for a decay in your Rating Deviation after the passage of time.  That is, after a period of time has passed your Rating Deviation will take on a larger value, representing the fact that we are less certain about your rating accuracy than we were when you were playing regularly.

Yet a third innovation in the Glicko system is that the equations to recompute your rating depend not only upon your own rating and rating deviation, but they also depend upon your opponents’ ratings and deviations.  For this reason, when you gain 31 points, your opponent may lose either more or fewer than 31 points, depending upon your respective ratings and rating deviations.

I will not reprint the Glick equations here because they are much more complex than the Elo equation above, but for the mathematically curious an overview that includes the equations can be found here.

For the true math geeks out there, you can read Glickman’s full technical article that was published in 1999 in the journal Applied Statistics by clicking hereImprovements to Glicko can be found in the Glicko-2 system

Finally, I will point you to Erik’s own article on the Glicko system used here on chess.com.  If you don't already know Erik, he is Mr. Chess.com.  I have tried to supplement, rather than duplicate Erik’s description.

I hope you enjoyed this brief overview of how ratings are determined, and that the next time you peek to see how many points you stand to gain or lose when you begin your new game you will appreciate the work that went into providing you the answer.  As you strive to become a better player and person, just remember to choose your move carefully, in chess as in life.

 [Postscript:  I would like to thank Prof. Mark Glickman for correcting an inaccuracy regarding Elo's assumption on normal distributions in the original post. -KG]


Online Now