FIDE rating changes: Are they working so far?

vladyx92

Updated: Oct 24, 2024, 10:41 AM | 0

March 2024 brought a compression of the minimum rating to 1400 and some calculation improvements. Let's discuss their impact, and some alternative ratings too!

This article first appeared on my Substack. It's an 11-minute read, and it's best done in a single sitting to unpack the full info at once.
*Update: Regrettably, Chess.com support could not help me with the issues. If the graphs below don't render for you on this page, I kindly ask you to read the original on Substack. This is a problem with parsing .png embeds...at least from what I can tell so far

Let me start by saying that I am not affiliated with FIDE or any other entity that seeks to govern the complex machinations in the world of chess. I am just a chess coach (official one here on Chess.com, check my profile!), licensed as a FIDE National Instructor. Alternatively, I am a freelance journalist - no current affiliation in the chess world. This is an independent report, made out of interest in numbers - specifically, FIDE ratings in standard chess, and their validity at expressing true strength. Now that we got that out of the way, some limitations are in order. While I have been a soundboard for Jeff Sonas in the past, I like running my own numbers and hypotheses. Additionally, the changes are still nascent, and long-term trends should not be inferred from 7 months of action after the implementation. That is to say, “take any big statements that I make with an equally large grain of Himalayan salt!”

There are 2 families of data sets that I will continuously reference throughout this analysis. One is the FIDE Ratings Download page, which currently extends all the way back to the monthly February 2015 rating list. This allows for meaningful longitudinal analysis. Another important resource is the Universal Ratings System (URS), an independent rating system commonly used in the Grand Chess Tour. As of the time of this writing, their Downloads page is not indexed properly, but I still managed to download monthly lists using a little Python coding. The URS serves as a perfect validation for FIDE ratings...more on this a bit later.

The structure of the article will be as follows, just so you know what you’re going to be reading ahead of time:

FIDE Standard ratings distribution
Effectiveness of calculation improvements on new player ratings
Geographical disparities
Some factors that explain the disparities
Summary - Odds and Ends

You don’t have to be a data nerd in order to enjoy the pictures and graphs. However, familiarity with basic statistics at high school level is implied throughout the discussion. A bit of knowledge about the way FIDE ratings work would also help in parsing the material, and so would familiarity with the main discussion points of this Jeff Sonas article.

1. FIDE Standard ratings distribution

The above graph is a snapshot of all people with a FIDE Standard rating on the October 2024 list (nearly half a million players). The decision to cut off the graph at 2400 Elo and 80 years old was made consciously, as those categories above are so sparsely populated, they wouldn’t provide any contrasting points. It should be clear that teenagers are dominating the chess world nowadays, by sheer presence. Their accumulation in the 1400-1500 Elo range should be particularly alarming for adults facing them, as they tend to be extremely underrated. This graph should be interpreted qualitatively - it's not saying much, just showing a heatmap of where the player accumulation is (age + rating).

In the Sonas article (on page 8), three age categories are introduced: the under-19 improvers, the 20-38 aged stable players, and the 39+ decliners. I have sought to verify that this delineation makes sense post-compression as well. Here’s what the distribution looked like in March, immediately after the compression.

You can observe the sudden drop-off in the number of players introduced at 2000 Elo due to the very nature of the formula. Remember, only players rated below 2000 gained points “for free.” This behavior is natural, like crowding more people into a confined space.

And here is what it looks like now, in October 2024.

An astute reader will sum up the percentages and claim they add up to less than 100%. That is correct, the FIDE players database contains entries without a birth year...
The improvers pool is steadily gaining representation (+2.6%) and rating (+2 Elo), as expected.
The stable pool hasn’t been that stable lately, losing both representation (-1.1%) and rating (-3 Elo) in the 7-month interval.
The decliners pool is steadily losing representation (-1.5%) and rating (-7 Elo), as expected.
Overall, deflationary pressures on the system have remained, albeit quite mild compared to previous years.

If anything, we should be extremely happy that the 3-fold segmentation into age categories seems not only accurate, but also descriptively named. This is a trend that should be followed years into the future. At the current rate of growth, I expect the number of improvers to exceed the number of decliners in early 2027 at the latest. That is when the deflationary pressure on the system should subside, which brings me to my next point...

2. Effectiveness of calculation improvements on new player ratings

As explained in the FIDE Handbook on Rating Regulations, a player needs to face at least 5 opponents with FIDE ratings and perform adequately in order to obtain their first official rating. Article 8.2.2. contains one of the calculation improvements, which is to add two virtual draws against 1800 opponents for any new player in the system. The main reason behind it was to avoid players entering close to the rating floor, and being underrated compared to their actual playing strength. It is time to evaluate the impact of this rule change, as it pertains to classical ratings only. In order to do so, the methodology is as follows:

identify players in the Oct 2024 Standard FRL who were not present in the Mar 2024 iteration and perform summary statistics (N=28529).
similarly look at players on the Dec 2023 FRL who were not present in the Mar 2023 iteration. The interval was chosen specifically to represent a roughly equal sample of newly rated players (N=26989).

Lots to unpack here! If anything, it appears that players are injected CLOSER to the rating floor when accounting for the 40% compression. The average Elo of 1331 on the left-hand side would correspond to a compressed Elo of 1599 on the right-hand side, when in reality the average value is 1571. This is remarkably shocking, as the goal of the two virtual draws against 1800 opponents was to push this injection point further away from the rating floor. There is a caveat, however...bonus points if you write a comment detailing why this methodology is a bit hand-wavy

What if we look at segmentation by age category, could that show the culprit?

This is decidedly worse! If anything, the injection of new players into the system followed a more pleasantly shaped bell curve in 2023. The U19 players are still injected too close to the floor, and the main effect of the calculation improvements so far has been to add noise. I am not sure that such a panicky conclusion is truly worth broadcasting after only 7 months, but someone may want to forward this article to the FIDE QC, because things are not going as expected. If I were them, I would monitor this trend, and keep detailed statistics of where people are injected into the system. Since I have already started the task independently, I will share some results here. In my opinion, the culprit is not the calculation formula itself, but...

3. Geographical disparities

It is no secret that some countries are more underrated than others. Ask any GM how they feel about facing young 2300s coming from countries such as India, China, Kazakhstan, Armenia, etc. Their reaction should tell you everything there is to know. The reason for this disparity is mostly the nature of the rating pool in each of these countries. For the most part, it is a closed system, with few players traveling abroad and mixing with other federations. I shall try to establish this fact by looking at the injection point of new players, then by comparing the FIDE ratings to the independently maintained URS database. Hopefully some trends will begin to emerge - I have already posted some findings on my Twitter, which were received well, including by someone named Anish Giri.

We can infer that some countries are more deflated than others from using two separate methods. First, let’s look at the injection point of new players into the system. This data also captures the rating evolution of some of these new players between the March list (when they were not present in the FIDE list) and October, so it's not necessarily representing the initial rating of each player. Still, I will ask you to take a leap of faith with me, and trust that the current ratings of these players are very close to the initial ratings. To keep meaningful data, I also retained only federations that have added at least 100 players with standard ratings during this 7-month interval. That’s a whopping 56 federations!

The box plot with whiskers shows a clear marker representing the average near the middle of each box. 95% of the entire distribution is contained between the edge of the whiskers, and the data points beyond that are big outliers. Some of the usual suspects are present here - India, Armenia, Uzbekistan, etc. If we assume that FIDE Elo is truly a universal measure of skill and that geographical disparities do not exist, we would expect all of these boxes to be roughly aligned with each other. After all, there’s no reason why a new OTB player in Netherlands should be 300 points stronger than someone in Sri Lanka taking up active OTB play, on average. Clearly, the geographical disparity is a pronounced effect.

What if I told you there was a way to compare the validity of FIDE ratings by using an independent rating system?

We are taking a quick detour here...enter the Universal Rating System, referenced in the introduction. I won’t bore you with technical details, but If I were to summarize its inner working, I would describe it as an iterative performance rating over the past 6 years of activity, taking into account FIDE-rated games in all time controls. There is also an exponential decay curve, which weighs recent games heavier compared to older games.

If you are still skeptical, let me superimpose the current rating distribution in both systems, as of now. The dataset after the intersection contains roughly 220k player entries, so it’s a significant sample.

If this was a beauty contest, the winner should be clear. The URS rating distribution is nearly a true Gaussian. It’s so beautiful, that sometimes I wonder why it hasn’t been met with widespread adoption among chess players. Old habits die hard, I suppose... Where am I going with this? Buckle up, as the next graph is gonna blow your mind for sure. In the previous section, we have already established that U19 players are injected too low in the current distribution. Let’s see, maybe URS can capture the true strength of U19 players better (new and old to the system, inasmuch as someone U19 can be called "old")?!

“HOLD UP. You are telling me there’s a more accurate rating that I should check whenever I am paired to a junior opponent?” Yes! By orders of magnitude better...and sorry to inform you that the Indian 1600 kid you are facing is actually rated 2000 URS. My condolences to your Elo - life’s tough...

By now, I have convinced you that one rating system seems to capture the true strength of juniors better. Next, let’s show which countries (on the whole, not just the U19 players) are underrated, and which ones are overrated. The methodology here is easy to follow. Download the URS rating list, then merge it with the FIDE rating list with a simple matching along the FideID column. I grouped everything by federation, then kept only federations that have at least 500 players in that 220k merged sample - 69 federations total. This excludes countries with a low level of chess activity.

Players from countries at the top are significantly underrated, while those at the bottom would be great destinations for some chess tourism and “Elo farming.” The central tick represents the average of the distribution, while the edges of the box are set at 1 standard deviation. If you paid attention in your Stat class, this should capture roughly the middle 68% of the distribution - the samples are not truly normally distributed here, but that's a story for another day. An easy sanity check is to look at the asymmetrical point where the average is marked - for some federations it's not even close to the middle of the bar.

Let’s recap:

The countries with the lowest injection point in their FIDE rating are: Sri Lanka, India, Georgia, Bolivia, Armenia, Peru, Uzbekistan, Azerbaijan, Kazakhstan.
The most underrated countries (by the difference between URS and FIDE in the graph above) are: Sri Lanka, India, Vietnam, Uzbekistan, Iran, Uganda, Bolivia, Kazakhstan.

This is great! Two completely independent methods show similar results, giving confidence that the geographical disparity is not just a perceived effect, but a real one.

4. Some factors that explain the disparities

A natural question could then be, “What is the main predictor of this deviation between FIDE ratings and URS ratings in a specific country?” I asked both Twitter and ChatGPT to do a bit of feature engineering, and am happy to report that Maurits van der Meer got the best answer of the bunch. If nothing else, this simplistic scatter plot should convince you of the fact.

That’s the winner, by far. Still, some other features are important and could explain part of the variation. Here’s a summary that I am happy with. It sits at the intersection of “I have thought enough about this” and “Doing more is overkill.”

*Please note: As these are R-squared values, they don’t capture the proper relationship (reminder: the square of a negative number is a positive number). A high average rating in a country’s population is negatively correlated with the level of “underratedness.”

5. Summary - Odds and Ends

Young players (defined as U19, per the Sonas methodology) are joining the rating pool at an accelerated pace. By 2027, they will exceed the number of aging, declining players.
U19 players are sapping rating points away from older, more established players, putting a deflationary pressure on the FIDE rating system overall. The uneven K-factors may need to be revisited in the upcoming years.
The “calculation improvements” implemented by FIDE so far have not necessarily been an improvement, but rather introduced extra noise in the distribution (It's been only 7 months, I mentioned this caveat in the intro - it could stabilize in the future!)
The URS distribution is remarkably smooth and captures a more accurate playing strength, especially for countries situated at the edge of the underrated/overrated range.
The geographical disparities in rating can mostly be attributed to the percentage of youth players in the overall chess playing population of each specific country.
You now have a list of countries to visit (and a list of countries to avoid), if your sole intention is to gain Elo.

And that’s it from me today, folks! If you want to delve deeper into the matter, I welcome your input, though I cannot make any promises that this research will continue, other than strictly from a hobby standpoint. And, as always, thanks for taking the time to read my work. I am fortunate to be part of a community that seems to appreciate quality long-form posts. Until next time, ciao!

Best ways to support my work, and independent journalism by extension:

I am primarily a chess coach. I give private tailored online lessons to anyone rated under 1800 FIDE, but specialize in working with adults. You can read more and book a call here.
"Like" this article and comment on it to make it more visible to others who may enjoy quality writing
Subscribe (it's free!) to my Substack
Follow my Twitter
Donate to my PayPal

FIDE rating changes: Are they working so far?

vladyx92’s Blog