Deflation of worldwide ratings?
There is a recent article by Walter Wolf on Chessbase about a possible decline of worldwide ratings. This longish post contains my thoughts on the issue.
Introduction
In the last decaddes there has almsot been a 'consensus' that we are seeing rating inflation - with more and more players breaking 2700 and 2800 barriers. In his article, Wolf provides some numbers that apart from the top, ratings are decreasing worlwide!
My own experiences kinda agrees with this. However I think Wolf's attempt to explain the observations with FIDE K-factor alone, is less than half of the story.
Instead my take on the reasons are: total chess improvement worldwide, is more than the number of rating points added to the worldwide pool, and a systemic underrating of new players.
This is due to two reasons:
- The variability in K factor does not create enough new points.
- The initial rating formula is designed to underrate players.
This blogpost will focus mostly on the second point, but first a few words about the first one.
About K factor impact
Wolf suggests that the fixed K factors as the culprit to the 'underrating'. This is not wrong per se, but not the whole story. To keep the long story short: my take is that FIDE rating system assumes an environment where tournaments have many established players with accurate ratings, and only a few youngsters enter the player pool. In this case the youngsters with K=40 could over time generate enough new points to account their improvement.
But, in emerging chess nations, there are very few players with established ratings. Instead the few FIDE rated tournaments will be played mostly by kids with K=40, and facing other kids with K=40. No new rating points are generated this way. Instead, the kids that improve faster than their peers, might soak up a few rating points. And kids who improve, but not as fast, will even lose rating!
Initial rating
In my view the calculation of initial ratings is a major contributor to rating deflation. The FIDE initial rating system is designed to award initial ratings that are below player levels, possibly to avoid anyone to get a too high rating and instead err on the side of 'caution'. The heavy dependence on opponents ratings results in further deflation when new players receive an initial rating from playing 'underrated' opponents.
The FIDE initial rating is calculated by:
initial_rating = average_opponent_rating + performance_modifier
Where average_opponent_rating is just the average rating of all rated opponents, and performance_modifier is a fixed number of points awarded (or deducted) depending on the % scored in the event.
For example: the performance_modifier for a 50% performance (like 2.5 points of 5 in a tournament) is zero, and for a 20% performance (scoring 1 of 5) is -242 rating points (it is negative). The performance modifier is independent of opponent ratings.
When it comes to scoring more than 50%, in theory, one should be able to mirror the performance_modifier : scoring 80% (like 4 wins of 5 games) should give +242 points above the opponent average.
But FIDE decided that doing so could allow a good performance result in a high initial rating, and instead awards every 1/2 point over a 50% performance with 20 rating points. In practice, scoring 80% results in a performance_modifier of +60 points -- a far cry from the +242 from the actual performance!
So, the first flaw is that the 'reward' for a good tournament performance is reduced, while the 'punishment' for a bad tournament is in full effect! If we assume that some players do well and some do bad in about equal numbers, this results in initial ratings (on average) end up below the 'level' of the players. Hence players start underrated (in average).
The second point is that the initial rating is heavily dependent on the average opponent rating. This leads to many quirks like the paradoxical: to get a high initial rating it is often more important to get high rated opponents than the outcome of the games! There are bizarre cases where it can be beneficial for the initial rating not to showing up at the board! A win against a low rated opponent might often not compensate for the hit from lowering the average rating of the opponents!
The situation in emerging chess nations
In many emerging chess nations, there are very few FIDE rated events, and almost no players with established ratings. There might be one rated tournament in a year, and very a few players play any abroad. The majority only plays the annual local event, receiving an initial rating maybe some 100 points below their level. That is not so bad. Yet.
A year later the kids who got initial ratings 100 points below their ability have now improved maybe 50-100 points, and play the same event again. Since it is the same pool of players, there is no new rating added. The underratedness of the pool remains. But a new wave of kids receive initial ratings! Since the player pool as a whole is already some 150 points below 'their level', the next wave will receive (on average) initial ratings some 200 below what they would elsewhere. This keeps perpetuating, year after year. In many places in the world we now see teenagers rated 1600-1700 but beating 2000s with ease when abroad. There are several youngsters here in Taiwan having 1300-1500 FIDE ratings, but well over 2000 ratings online, like here on chess.com.
There is also something to be said about FIDE rating floors and about only five games being enough for initial rating, but this post is far too long already.
Conclusions and Remedies
Maybe you noticed that the words like level or underrated has been used within quotes above? The reason is that there is no such thing as a 1600 or a 2200 level. It is all relative. Constructing a rating system where a certain ability results to a certain rating (through time) is very hard, if not impossible. Small details matter a lot, rating change formulas and initial rating both needs to be chosen very carefully not to cause too much or too little variability. Rating systems where a certain outcome is enforced, will end up being manually tweaked in horrible ways.
What little can be done is not yo deliberately underrate initial ratings, and use more than five games to calculate the initial rating. Similar to Wolf's suggestion, adaptive K values might be a way to go, which is what rating systems like Glicko and Glicko2 already do.
Some other ideas is the use opponents' performance ratings to calculate initial ratings, but it only establishes a more accurate rating relative to the pool, and cannot compensate for underrated pools.
Hmm. Maybe the URS guys are onto something?