Visualizing FIDE chess data (maps, graphs, plots)

Sort:
watcha

I try to avoid text files as far XML is available. Still converting the XML is a very time consuming task, so thanks for providing an alternative.

I also added some more utility to GitHub. Notably one that converts Fide country codes to country names. Besides 'ENG' it can also convert 'FID' which is Fide Online Arena. Once the code is written it is easy to modify it to add new countries. However these two were the most annoying ones.

https://github.com/chessstats/fidecountries

The two functions that it provides:

I reproduced the young talent list with taking into account standard deviation and it is very interesting. Thanks for this suggestion. I posted my new list here:

https://www.chess.com/forum/view/chess-openings/praggnanandhaa

watcha

Enough of parsers written in high level languages which take ages to run. Also afraid of hard coding fixed length field values and disturbed by the fact that the fixed length txt has column names totally different from the XML.

Here is a Windows 64 bit executable that does the job at lightning speed and produces an R data table in text format out of the box:

https://github.com/chessstats/parsexml

( Also available for other platforms, however this requires some additional installation effort. Note that this is not a general xml parsing tool, it specifically targets the Fide players list. )

InfiniteFlash

Tomorrow I will be releasing the coolest visual component by far of the app. I think it will be awesome haha. It's a gvisMotionChart and time series chart with aggregated FIDE data over every january for each year for 17 years. It took me more than a few hours of hardcoding and nitty gritty mining to fix all of the text files to EVENTUALLY get the needed dataset to get it in the format the gvisMotionchart requires. It paid off though.

InfiniteFlash

I powered right through just to display those aggregrate statistics, heres a screenshot of the motion chart. It's too cool

watcha

R script for downloading all Fide standard rating lists from 2001 to 2016:

https://gist.github.com/anonymous/109af12d68b1db106a2a

Uses artificial intelligence to find out the names of the files that can be downloaded. Artificial because it tries every possible prefix/month/year combination and intelligent because if such a file does not exist, handles the execption in a catch block.

InfiniteFlash
[COMMENT DELETED]
InfiniteFlash

Watcha, if you could create a script that intelligently reformats all 96 of the text files, youd be my hero.

Are all of historical data file, that are in XML format, formatted well? That may be my only choice to utilize all of them.

watcha

Here it is. Script for converting text format Fide rating lists to data tables:

https://gist.github.com/anonymous/acd9a90740a471b965d6

( Expects text files in the hist_download directory. So the zips have to be unzipped first into this directory. 7z allows me to select all the zips and unzip all at once, this is why I did not use R for unzipping. )

Auto detects column widths and column names. It is still converting the files I downloaded but the data tables look more or less right.

InfiniteFlash

 

I hope it works! Testing now.....

 

It looks as if its merging certain columns together. Many of the text files have two column titles merged together in some shape or form.

Here's a screenshot of what I mean. I spent like a few hours bipassing this by just repositioning them in the text files for every single january of every year (A painful decision). Unfortuantely, there are 4 different formatted text files FIDE provides in January so some of your data tables willl output great, but I worry that some will not. Am I wrong?

 

watcha

Look, I never believed that Kirsan Nikolayevich Ilyumzhinov was abducted by aliens.

But after Fide attibuting a name 'GamesBorn' to a column which in reality holds two columns, 'Games' and 'Born' merged, I grudgingly admit that I was wrong. There is nothing to do here, no software could ever detect that these are two separate columns.

InfiniteFlash
watcha wrote:

Look, I never believed that Kirsan Nikolayevich Ilyumzhinov was abducted by aliens.

But after Fide attibuting a name 'GamesBorn' to a column which in reality holds two columns, 'Games' and 'Born' merged, I grudgingly admit that I was wrong. There is nothing to do here, no software could ever detect that these are two separate columns.

Isn't there some machine learning technique that you could teach to a computer to look for column words, word pairs, and parse by column width and length accordingly to those words?

Such a program may not appear anytime soon I guess. It would awesome if it did.

watcha

Here is a Windows 64 bit executable that converts the lists correctly:

https://github.com/chessstats/parsetxt

Fixes the merged columns, converts the column names so that they are the same across all files, removes leading and trailing spaces from values.

There are some hopelessly corrupt files ( for example those without column names ) which it ignores.

Any feedback is welcome on files that it converts incorrectly.

Edit:

Here is the fix for the hopeless files:

https://github.com/chessstats/parsetxt2

Some common sense cleanup ( renaming files to 'yymm.txt', adding missing columns as NA, changing birthday to "yyyy", same order of columns in every file ):

https://gist.github.com/anonymous/ae66fd2867a1657ef9f6

watcha

Collect rating of Anand, Kramnik, Carlsen, Grischuk, Leko and Nakamura during 2001-2016 from the converted files:

https://gist.github.com/anonymous/931408d7e7636bf1c90e

Result:

Nothing is moving here. I'm at a stone age level in animations.

Compile rating lists:

https://gist.github.com/anonymous/90fe0b5302b791ab2dca

Jan, 2001 rating list:

InfiniteFlash
watcha wrote:

Here is a Windows 64 bit executable that converts the lists correctly:

https://github.com/chessstats/parsetxt

Fixes the merged columns, converts the column names so that they are the same across all files, removes leading and trailing spaces from values.

There are some hopelessly corrupt files ( for example those without column names ) which it ignores.

Any feedback is welcome on files that it converts incorrectly.

Edit:

Here is the fix for the hopeless files:

https://github.com/chessstats/parsetxt2

Some common sense cleanup ( renaming files to 'yymm.txt', adding missing columns as NA, changing birthday to "yyyy", same order of columns in every file ):

https://gist.github.com/anonymous/ae66fd2867a1657ef9f6

Uh.....I'm kind of lost when using the "go programming language".  I downloaded it, but what do you mean by 

2) create a workspace directory for Go
3) set the PATH environment variable to the full path of the workspace directory
 

am a total newbie with this stuff. Do you mean create a folder where I have the parsedtxt2 zipped file? Totally lost here.

watcha
InfiniteFlash wrote:
2) create a workspace directory for Go 3) set the PATH environment variable to the full path of the workspace directory  

am a total newbie with this stuff. Do you mean create a folder where I have the parsedtxt2 zipped file? Totally lost here.

I you have Windows 64 bit, just download the zip, unzip and in the unzipped directory you find the exe ready for running.

I you have a platform other then Windows 64 bit ( Linux, Mac OS, etc. ) then you have to install the Go language.

I send you a link in a private message where there is a very detailed discussion of the issue. Please read it carefully.

InfiniteFlash

Thanks watcha, will read the "thread" tomorrow!

If you'd like, I can suggest  a few packages that will display the nicest animations and graphs that I think you'd like in R (as you mention at the top of the this page). Its the least I can do since you parsed those files!

InfiniteFlash

For time series graphs (as you indicate in post #61): the dygraphs package is head and shoulders the best thing to use in R. I'm currently working on manipulating one to output a clean and tidy rating chart. This link will walk you through it.

https://rstudio.github.io/dygraphs/

If you are just interested in creating awesome visuals, the the googleVis package is godly. It's super easy to work with too.

For just silly and stupid cool graphs, then the networkd3/threejs/shinyGlobe are what you want.

I still haven't forgotten to post the code to github when I am done. I hold my word on uploading the code, only when I am done though. It wouldn't feel right to post an incomplete app.

InfiniteFlash

For example, here's a rough prototype of Teimour Radjabov's  Standard rating over time using the mentioned dygraphs package.

InfiniteFlash

A slightly more advanced histogram, overlayed with a density curve. All that's left is to add a 2nd y-axis label to indicate number of players. (y-axis label is incomplete!)

 

InfiniteFlash

I spent way too much time trying to figure out how to generate this. It was extremely annoying. At least all that's left is to create a legend box so that the lines dont get cut off.