I would like to create an excel sheet in order to find out the FIDE rating distribution. Something similar to the below picture
Data that should be used can be found here: Download full list of players (not rated included) STD, RPD, BLZ combined(XML) (Updated: 30 Nov 2012, Size: 8 536 363 bytes)
Here is a topic with something similar but the data is from 2009: rating distribution topic. He was kind enough to provide a link to the excel file that he used.
The problem is that excel allows only 65k fields and since there are more chess players than this number I don't know how to put all the data from the .txt file mentioned above in an excel file.
I'm not that good at excel so I would appreciate your help.
uhm, there must be something wrong in that chart.
I read on chessbase that being USCF 2000 means being in the 99 percentile. It's not possible that for FIDE the mean is around 2000 like showed in the chart.
Think about it, it's not possible there there's the same number of 1600s and 2400s...just walk into any chess club.
No, it could be true - I have no personal knowledge that it is, but it could be.
FIDE doesn't rate everyone like USCF does, their pool is much more narrowly drawn. Lower-rated players in events with dual rating don't earn a FIDE rating until they have faced enough FIDE-rated players AND earned a point against them, so the lower end of ratings would drop off very quickly.
Guys, pkease read my message again. Let's make our own chart and after that we can discuss :))
Anyone with strong excel skills?
Recent versions of OpenOffice (also) support 1 million.
I use Open Office writer in lieu of MS-Works and MS-Office and occasionally use the spreadsheet or database functions and as far as I'm concerned it's just as good as MS-Office - and free. I don't know if it will do what you're asking about here but it's worth checking out.
I don't have those applications, can somebody that have them already to put the dat into a spreadsheet?
Just gave it a shot, without having any experience with OO Calc. Got it working with small datasets, but for now got stuck on a data overflow on the whole dataset. I guess I'm using the wrong method
Why not just use a measurement key of say 1 = 1000 players or something of that sort?
Many of them are without rating, I need to import the data and run a filter.
If you check the links they have also the data available in .xml but unfortunately I can't open the file
I can not upload the file for you, but here is the distribution, taking into account only active FIDE rated players (#154 687)
However, having looked a little into the data, I think this distribution is bound to evolve a lot in the coming years, as the FIDE rating bracket has been enlarged considerably, and there are lots of new players every year.
There are over 195 000 other players listed in the file, who are still unrated, but have already posted FIDE results, and may enter the charts anytime soon...
hicetnunc, thanks. What ~ rating should you have in order to be classified in the best 10% bracket
At the moment ~2150, but this is bound to fall in the coming years with the new entrants, so +2000 is a good bet
If you check the links they also have files with all the rated players. No unrated players in them. So no need for filters or whatever you're saying. K.I.S.S.
Unix way: cat players_list.txt| cut -c110-113 | sort | uniq -c > rating.csv
Then a little creative with pchart results in: http://jeroen.se/various/fideratingdis/rating.png
Intermediate csv file at http://jeroen.se/various/fideratingdis/rating.csv
Yet another picture :) Y axis is percentage