Blogs
Battle of the Nations

Battle of the Nations

LionChessLtd
| 0

I'm back. After many many (too many!) years, I finally did it. The comparison of countries is finally here!

My apologies it took that long. I tried to make a start a few times but things always got in the way, then it was postponed for an other day and eventually the whole analysis ended up at the bottom of my to-do list. Anyway. Here we are.

As we sadly live in a world in which people get increasingly easier to offend by virtually anything, let's address the elephant in the room first: This analysis is purely looking at the data and discusses some findings. It's not a metric of how intelligent the population of a country is or which ethnicity is the smartest, etc. It wouldn't be even possible by only looking at chess since it isn't uniformly popular around the world and thus not a good metric to judge someone's intelligence. So let's not take this too serious but instead have some fun with numbers. Lots of them!

One of the reasons why I didn't get the analysis done earlier was that I previously used a spreadsheet to process all the data. It was doable with simple divisions, like male / female and only a few thousand data points. It was slightly more challenging with the age analysis. Now, we have over 100 countries to look at and if I'm not mistaken, the number of FIDE registered players also exploded; we have now more than 1.3 million entries! Too much for spreadsheets to handle efficiently, if at all!

So I turned to Python which is a great tool for such tasks. The downside is that Python is a programming language and it is therefore all too easy to make a dumb (human) mistake. A computer is (normally) doing what you are telling it to do and any errors are almost always the user's fault. In this case we can put the blame on the programmer. Which is me. Thus, there is the risk of some subtle mistakes in the analysis. I did carry out spot checks and did some calculations by hand and I could not find any obvious errors. Since we are not designing space rockets, let's be content with the fact that some inconsistencies may be present and move on happy.png

One final caveat: I was not able to identify all the countries by their three-letter codes as used by FIDE reliably. Also, FIDE has a lot of (strong) players pooled into a "country" named FIDE and this does distort the results a bit. Someone may know the reason behind this (stateless players perhaps?), but I don't know why FIDE decided to declare independence happy.png

A table of the country codes that are used in this article is provided at the end.

Also, some sports do have a rather strange notion of what a country is: While Mexico is Mexico and Angola is Angola, the United Kingdom is... well... non-existent... Instead, it is split up between Scotland, England and Wales (where is Northern Ireland?). I did consider pooling them up under the United Kingdom, but there are other similar cases too and we probably end up offending someone... So let's stick with whatever data we have at hand - courtesy of FIDE - and not complicate things further...

Before making a start, I'd like to be transparent of how the analysis was conducted. Statistics can be manipulative: It's often not too hard to massage the numbers and/or methodology a bit to get the results to fit a given narrative. Norway ranks too high for your liking? Let's not include a country's strongest player then. A country benefits from a lot of young talented players? Let's pool teenagers into a youth league so they can be excluded and see whether that better fits our idea of the truth. "Do not trust any statistics you did not fake yourself" - a famous quote by Winston Churchill comes to mind here.

I do my best to explain what has been done and why I made certain decisions. So here is what I did:

I downloaded the January 2024 ratings list from FIDE. 1.3 million entries! Perfect for data analysis. After filtering, the lists had "only" a bit of over 86 thousand players for each the "Pro" and "Amateur" category left. More on that distinction later. At first, I only looked at the list of Pro players, but realised that might be too biased. So I also crunched the numbers for the Amateurs and the combined list. In all cases however, players with following attributes were excluded:

  • inactive players; the idea is to get a current picture of which countries dominate the chess world. Also, "rating inflation" could distort the picture;
  • Players who have no birthday specified; while we don't care about the age of a player, I found it odd why this was not given, so I excluded such cases;
  • Players who had no Standard Rating applied (Blitz and Rapid chess was ignored);

What was left after the above criteria had been applied was then put into three lists:

  1. A list containing all active players (total of 173,168);
  2. A list of all "Pro" players (total of 86,292);
  3. and a list of all "Amateur" players (total of 86,876);

Now about that "Pro" and "Amateur" players. I confess it is not a good terminology, but I could not think of anything better. I simply defined a "Pro" player as anyone who played at least 30 games in tournaments and thus had a K-factor of less than 40 (normally either 20 or 10). The "K-factor" is used in the ELO calculations. In short: anyone who played a reasonable amount of games. Amateurs were then all active players left, i.e. a K-factor of 40. Of course, that doesn't necessarily mean that a "Pro" player is playing at a higher level than an "Amateur"... In fact, the highest rating for the amateurs was a whopping 2338 - that doesn't look like a beginner to me!

The final question to answer is, how do we even measure the performance of a country? By each country's top player? A bit boring and you can do this yourself by looking at the live ratings list. How about the Top 10? Top 100? Average of all players? Average of only the players in our Pro list? Should Blitz and Rapid Game play be included too? Should men and women be separated? There is no easy answer...

On the final point of men and women, the data set does not specify at what tournament a person has played. Obviously, chess is not a physical sport and men and women can compete as equals. Yet we do have women championships and this can distort the figures. The ELO system is great but has its flaws. By having various pools like separate chess federations or leagues, the ratings do diverge. It is possible to have a FIDE rating of say 1800 but in your regional league you may score a nice 2000.

So what is a good metric?

The more I thought about it, the trickier it became. Are averages good enough? Certainly for a small sample size (like the top 10), I don't think so. Consider the top three players of a country and country A's players have ratings of 2900, 2300 and 2300, respectively. Country B's top players are all rated at 2490. Who is better? The average of country A is higher, but only due to one very strong player. In a competition, country B would (probably) win 2 out of 3 games.

Since this is really just a fun analysis, let's stick with averages though and let's not get into a debate of the right methodology - otherwise this article would end up as a PhD thesis and we better let statisticians worry about such details.

Enough of the preamble and let the numbers do the talking now!

The results

Number of Top Players included
Rank Top 1 Top 2 Top 3 Top 5 Top 10
1 NOR (2830) USA (2796) USA (2783) USA (2765) USA (2725)
2 USA (2804) CHN (2760) RUS (2750) IND (2739) IND (2710)
3 CHN (2780) RUS (2760) CHN (2747) RUS (2734) RUS (2686)
4 RUS (2769) FRA (2747) IND (2744) CHN (2723) CHN (2685)
5 FRA (2759) IND (2746) FRA (2725) FRA( 2691) AZE (2646)
6 NED (2749) NOR (2730) UZB (2705) AZE (2680) FID (2642)
7 IND (2748) AZE (2725) AZE (2702) UZB (2675) FRA (2641)
8 GER (2743) IRI (2718) NOR (2694) FID (2672) GER (2638)
9 IRI (2740) NED (2716) POL (2688) UKR (2665) UKR (2638)
10 POL (2732) UZB (2714) NED (2686) GER (2662) ARM (2637)

First, let's look at the number of top players per country and their averaged ratings (listed in the brackets). Going with only the top player is rather boring as it reflects the live ratings lists. So I also included the cases of the top 2 players of each country, top 3, top 5 and top 10. What is interesting to see here is that the US is consistently strong across all the cases considered.

China and Russia are performing consistently well too and remain in the top 5 places throughout the cases considered in the above table.

India seems to have a large pool of very strong players but appears to lack of enough elite players as it ranks higher the more player you consider. In contrast, Norway benefits hugely from Carlsen (the world's strongest player), but then quickly disappears from the table.

(One caveat: If a country did not have enough players for a given number of top players, it was not included. For example, we needed at least 10 players to analyse the average of the top 10 players. Otherwise the results would be skewed in favour of countries with only a handful of very strong players. I doubt it would be a problem for the above list of elite players, but you will see shortly why I mention this regardless.)

How about averages of all registered players across the three lists? Sure thing, here is the table:

Average of whole
Rank Pro List Amateur List Combined List
1 YEM (2209) RWA (1809) BIH (1911)
2 CHN (2199) SOL (1757) KOS (1885)
3 MLI (2163) MYA (1755) MYA (1882)
4 FID (2104) NGR (1745) FIN (1875)
5 BDI (2092) KOS (1738) LIE (1870)
6 INA (2083) ETH (1730) SRB (1855)
7 CAM (2080) SSD (1718) NED (1849)
8 ARM (2052) NED (1686) NGR (1822)
9 UKR (2050) BIH (1685) AUT (1810)
10 MGL (2050) NRU (1683) MNE (1809)

The average rating of each country is again listed in the brackets and the averages are for the entire lists (Pro, Amateur and All Players).

The big surprise here was that with this method, Yemen - yes, war-torn and famine-ravaged Yemen - comes out as the top country for the pro players! At first, I thought I made a mistake so I dug deeper. It turns out that the reason for Yemen's top performance is that they only have five players that are registered with FIDE.

Rwanda is an even more extreme example: It dominates the Amateur list. Number of players? One. Same goes for Somalia. This does not allow for an overly fair comparison of course, but it was still an interesting find. Starting to prune data by some arbitrary value didn't seem fair either, so I left the results as they are. After all, it's just a fun analysis.

Now look out for the US - not even in the list any more when averages over the whole list are considered. Only a few countries show some sort of consistency when comparing the averages of all players and only the top players of each list. It's also interesting to see China taking the number two spot in the pro list (with 226 players) yet does not make an appearance in the amateur or combined lists. Somehow, it is nice to see that a lot of smaller countries and nations that do not stand out for being "chess obsessed" dominate those lists.

In all fairness, this analysis does not consider the population size of a country which will have a huge impact too. Being able to draw from a pool of over a billion people, you will find lots of talent. San Marino's population of 33 thousand and a bit will make it harder to find a lot of very good chess players.

Speaking of population size: While I did not include the number of people living in a country into the analysis, I did make a count (well, my PC did) of how many registered players there are by country. And the results turned out to be a bit surprising, at least for me. So I included those findings too.

Again, the same lists of Pros only, Amateurs only and a combined list of all registered players are used. Let's start with the Amateur list . The labels are only printed for countries with more than 1,000 players to keep the plot somewhat tidy.

Rating vs Player Number Plot, Amateur List
Ranking vs Player Number for Amateur List

Perhaps not surprising to see is that India takes the top spot due to its population size. China? Disappearing somewhere in the red sea (of the plot, not geographically speaking). If you were to consider the criteria for which a country needs to have a minimum of a thousand registered players to be considered in our rankings list, then the Netherlands (1686) would dominate the Amateur rankings, followed by the USA (1613) and Germany (1575). (The average rating is given in the brackets, rounded to the nearest whole number).

Going over to the Pro list:

Ranking vs Player Number Plot, Pro List
Ranking vs Player Numbers for Pro List

India is going down the list by quite a bit and ranks last when considering only countries with at least a thousand players. It was interesting to see how Spain, France and Germany are towering over the other countries in terms of player numbers. In terms of ranking? It's hard to see in the rumble for the top spot, but it's Serbia who takes the crown (1968) with a close second being the Netherlands (1961). While the USA (1939) is still performing well in third place, the number of players is a bit disappointing given the size of its population of approx. 335 million; more than Spain, France and Germany combined.

Finally, the combined list:

Ranking vs Player Number Plot, Combined List
Ranking vs Player Numbers for Combined List

As you would expect, the results are a mix of the pro and amateur list and it's therefore no surprise that Spain, France, Germany and India are having the largest pool of players again, with each of those nations having over 10,000 active players registered with FIDE. The rankings are again dominated by Serbia (1855) and the Netherlands (1849) with Austria (1810) being the only other country that achieves an average of over 1800 points.

That concludes our lengthy discussion of which country dominates in chess. Although we haven't established a clear winner here as it depends on how you define the winning criteria. Feel free to leave a (respectful) comment on who is the winner in your opinion.

What's next?

At first, I struggled to see what next I could analyse. We did consider age already, looked at men vs women and now we have an analysis of the chess performance of countries. Not much more data to process in the FIDE data set. We could re-run all the above cases and consider the Blitz or Rapid chess ratings instead, but it's pretty much the same old stuff again.

So that's it? Lucky for us, not really!

When working over this analysis, I thought it would be rather quick and easy before I realised the complexity of this project. Especially when I saw how fast the results changed when only a few players are considered. Who would have thought that Yemen would dominate the average list for Pro players?

Glancing over the various plots and tables I realised that it would be interesting to add some other variables to the analysis. As we saw, the USA could easily be seen as the top nation when it comes to chess as it stands. But is this fair? It's rather easy to play competitive chess in your free time if you live in a rich nation where you have more spare time at your hand, even have access to chess coaches and benefit of round-the-clock high-speed internet access. How many chess prodigies were never discovered because children have to work on their parent's farm?

So that gave me some ideas:

What if we were to take the population size of a country into account? What about ELO rating in correlation to a country's GDP, leading to something like ELO/$. Perhaps it's completely nonsensical, but that's the fun to just have a look at it and see what crazy correlations can be discovered. Any such findings wouldn't change the world but who knows? Perhaps there is a glimmer of truth out there waiting to be unearthed happy.png

Now as to the When? Honestly, no idea. I expect it to be quite some work just to find the necessary data, sanitise it and fill in the blanks, then do the various calculations and plots. Also, it was our family's small business that generated enough income that allowed me to spend some time on doing an analysis such as this. As much fun as it is, it sadly does not pay the bills. Due to some external factors, our business is now pretty much done for and my priorities are elsewhere. Like putting food on the table and paying said bills. In fact, it is a little miracle that I managed to even finish this project off. So please be patient with me and I hope to be back with more crazy statistics once an other income source for our family has been found.

Thanks for reading and you have my respect if you made it that far happy.png

Happy Chess,

Guenther

Lion Chess

Table of Country Codes and Country Names

Country Code Country Name Country Code Country Name
ARM Armenia MGL Mongolia
AUT Austria MNE Montenegro
AZE Azerbaijan MYA Myanmar
BIH Bosnia and Herzegovina NRU Nauru
BDI Burundi NED Netherlands
CAM Cambodia NGR Nigeria
CHN China NOR Norway
ETH Ethiopia POL Poland
FID FIDE RUS Russia
FIN Finland RWA Rwanda
FRA France SRB Serbia
GER Germany SOL Solomon Islands
IND India SSD South Sudan
INA Indonesia UKR Ukraine
IRI Iran USA USA
KOS Kosovo UZB Uzbekistan
LIE Liechtenstein YEM Yemen
MLI Mali