Visualizing FIDE chess data (maps, graphs, plots)

Sort:
InfiniteFlash

Oh, I see what you mean, you meant female participation as a function of age with respect to the population. Not just females by themselves as I have.

Okay, I will see if I can reproduce that. 

InfiniteFlash

Here are my rate charts for every single age by Sex. I'll make sure to include these graphs in the app.

Female/Male particpation rate

 

Male participation rate

 

 

Female participation rate

watcha

I'm only satisfied with this app if it reproduces all my results in the exact same form. There are selfish reasons at play here. Currently I have to download the XML file from Fide, unzip it, copy it to my program's directory, then run the program for five minutes or longer to generate all the 9929 files a total size of 281 MB. When this is finished smoke is coming out of the computer. If it is all done instead of me by the shiny server then I'm much better off.

Here is what I currently have:

I have so called collected keys. These are birthday, country, flag, title, rr, st. The first four are that of the Fide records, rr and st are generated ( rr being the rating cluster, st is just a dummy key for technical purposes: it stands for status, all players have the status 'reg', which means registered, this dummy key is necessary so that when I want stats for all players, I look at the key 'st' ). For every key there will be a subdirectory in the keycounts directory which lists all the values for this key. So there will be a keycounts\birthday\1980.txt file, which lists all the players who were born in 1980. The first step is to collect these keys from the XML file and create the keycounts directory.

The next step is to create stats for all the above files. From every file the number of rated ( R ), the number of males ( M ) , the number of females ( F ), the average rating ( AVGR ) etc. will be calculated. This will be done for all filters, which are: a, i, m, ma, mi, x ( active, inactive, middle age, middle age active, middle age inactive, all ). So in the stats directory there will be a file stats\birthday\a\1980.txt, which will tell the stats for all active players who were born in 1980:

When this is done, I create a summary table for each category which contains the stats for a certain key by listing them for all possible values. Separate tables will be created for the same table sorted by different keys. The file name indicates the sorting key. So for example the file keystats\birthday\x\bybirthday.txt will contain the table for the key birthday, for all players, sorted by birthday ( this happens to be the file that I used for the participation chart ) :

You can see that what were columns in the individual stat files now have become rows.

In addition I generate rating lists for all collected keys and filters, a young talent list ( which is a rating list sorted by the rating surplus compared to the expected rating of players by age ) and a number of titled players list by country. I have posted these in other threads.

watcha

When Scott Aaronson computer scientist was asked what programming language he uses if he needs to write a program he said he uses a very high level language: graduate student.

So when I'm harrasing InfiniteFlash with my feature requests what I'm essentially doing is to try to program an app in a very high level programming language.

InfiniteFlash

I am slowly getting those summary statistics, I'm adding a column at a time, only had tme today to try to tackle this, not really that far yet, but shouldn't be that bad getting all those statistics you had. Do you have a legend or key detailing what each of the variables mean?

From my brief time getting summary statistics as the ones youve mentioned above, the most interesting graphs I can produce are 3d mapping of some frequency variable with respect to two other variables.

Here is one pretty example of age vs rating vs frequency partitioned by Gender. This graph is not possible to map out in 2d because you'd get a nearly entirely filled plot (unless you code for density by using color ranges) of Rating vs Age.

 

 

 

 

watcha

These are the keys that I have:

So I can tell for any filter, any key and any value of that key any stat of the above.

For example I can tell for filter 'a' ( active ), for key 'birthday', for value '1980', the highest ranked woman's name in that category ( that is the highest ranked woman's name among all active players born in 1980 ).

Or I can generate a chart of the standard deviation of rating in the function of age for women ( because age can be calculated from the key birthday, which I collect, and I have a stat field STDDEVF for every possible value of birthday ).

 

By the way I only opened this thread because I tried to convert the Fide XML file to a data frame using R and doing so took so long that I got bored and opened the chess.com forum. All the time I was writing this post it was parsing this file. By now it has finished. I have to tell you that my program parses the file and creates tons of statistics out of it faster than R can read it into a data frame:

watcha

Inquiry into the essence of meaning - watcha's doctorial thesis on the R-language

 

I'm just kidding. Please lower your expectations, the situation is not that serious.

However it is true that I made the first baby steps towards creating an interactive app that can handle the fide players list. Here are my findings so far:

 

1) You can download a file from the internet using R, namely to download the fide players list you can do this:

download.file("http://ratings.fide.com/download/players_list_xml.zip","players.zip")

2) The above file is a zip, but R also has support for unzipping a zip file, in our case:

unzip("players.zip")

3) Now we have the XML file, let's suppose it is called players.xml ( its name is more frightening than this, but I assume we renamed it players.xml for simplicity ). Reading this XML to an R data frame then saving this frame as a table in text format goes like this:

library("XML")

library("methods")

players <- xmlToDataFrame("players.xml")

write.table(players,"players.txt")

( Having the data in a text table makes the size of the data much smaller, also it can be loaded into memory much quicker. )

4) For interactivity let's introduce a combo box. R calls it selectInput. I call it combo box. We can insert it in the fluidPage of ui.R:

( This lets us select one of three countries, or all countries. )

5) We somehow have to react to the changes in the combo. The observeEvent function is our friend. Every time the user selects a country, we read players.txt into a data frame, filter it by the selected country, then return the filtered list for rendering to the table,  This is done is server.R:

The result:

https://chessstats.shinyapps.io/fideplayers/

Forgive me for using the August 2012 data, the reason being that this XML file is much smaller than the current one, so the parsing can be done in reasonable time. Currently all this is only an excercise in the R language.

InfiniteFlash
watcha wrote:

Inquiry into the essence of meaning - watcha's doctorial thesis on the R-language

 

I'm just kiddding. Please lower your expectations, the situation is not that serious.

However it is true the I made the first baby steps towards creating an interactive app that can handle the fide players list. Here are my findings so far:

 

1) You can download a file from the internet using R, namely to download the fide players list you can do this:

download.file("http://ratings.fide.com/download/players_list_xml.zip","players.zip")

2) The above file is a zip, but R also has support for unzipping a zip file, in our case:

unzip("players.zip")

3) Now we have the XML file, lets suppose it is called players.xml ( its name is more frightening than this, but I assume we renamed it players.xml for simpicity ). Reading this XML to an R data frame then saving this frame as a table in text format goes like this:

library("XML")

library("methods")

players <- xmlToDataFrame("players.xml")

write.table(players,"players.txt")

( Having the data in a text table makes the size of the data much smaller, also it can be loaded into memory much quicker. )

 

 Oh, I didn't know that this was possible. Very cool! Will have to do this for downloading historical data.


 

Also, in the choices = "country name", I suggest creating a dataframe that has unique values of the Federation/Country column. 

This output is nicer than just hardcoding countries and you don't have to be annoyed by renaming them. The unique function and the names() functions are extremely nice for shiny inputs.

Below is code to avoid renaming those observations.

library(plyr)

library(countrycode)

#FIDE is the data set name

#generate list of countries

countries <- unique(FIDE[c("Fed")])

#order the list by alphabetical order

countries <- plyr::arrange(Standard_countries, Fed)

 

After all of this manipulation, instead of having:

choices = c("RUS" = "RUS", "ALL" = "ALL")

you can have just have just: 

 

#Convert 3 letter codes to country names if you'd like

countries$Country <-countrycode(countries$Fed, "ioc", "country.name")


#output the first column in the dataset as a vector, depending on if you assigned a new column

choices = countries[,1] or choices = countries[,2]


in your selectInput statement in the ui file, which just displays only the first column of the dataset (which is a column of Country names/Fed names) You should get something like this in your sidebarpanel input area.

 


 



I wish to include a datatable, but I have issues with outputting such a large filtered dataset (130000 is just too much for shiny on my compuer).

InfiniteFlash

Thanks for the list! I am in the process of getting all of the stats compiled into one table first. Once I have this, I should be able to output various stats based off this to a table in shiny as it is not so many observations. It'll be under 200 observations.

watcha

This thread is all about visualization, so lets visualize something.

This something is going to be the selected country. What we want, is that the background image of the application always changes to the flag of the selected country.

How can this be achieved?

First we have to realize that a shiny app is all about generating a HTML page. All fancy functions in ui.R just return HTML. To add background image to a HTML element you need to add a CSS class to its style that specifies the background-image property. To do this dinamically you need Javascript. Library shinyjs provides utility functions for using Javascript without having to write down a single line of Javascript code ( similarly to the functions in fluidpage which generate HTML without having to write down a single HTML tag ). To use shinyjs, you include it as a library(shinyjs), then in the fluidpage declare it as useShinys(). Now you are ready to use high level Javascript functionality. First we create a style sheet with shinyjs::inlineCSS that has a class for all countries, specifying its flag as the background image property. We also have to obtain the id of an element that encapsulates the whole application, so that adding a CSS class to it will change the style of the whole application. At this point we have to get closer to earth and generate a div HTML element semi-manually ( still not writing down raw HTML code ). There is a trick for this in shiny, we can write div tag like tags$div(id="main", children elements.... ). Children elements of this 'main' div will be what used to be the elements listed in fluidpage. We just encapsulated them in a div tag, the id of which we know. When the user selects a country, we will add the class belonging to the country to this div element with shinyjs::addClass. It is a bit more complcated than this, because first we have to remove the old class with shinyjs::removeClass. For this we have to keep record of what the old class was in a global variable. The trick is that you assign to a global variable with <<- instead of <-. It took me half an hour of had banging into the claviature to find this bug. If we use country codes as class names, then we can simply use the combo's selected value as a class name. Of course we have to store the images of the country flags in our directory. However there is a dirty little trick here: if you think that the root directory to which the url of the image is relative is the shiny workspace directory, you are wrong. If you write down the image name as an url, then the files have to be in a directory called "www", the resides within your workspace directory.

Putting all these bricks together we get:

https://chessstats.shinyapps.io/fideplayers/

watcha

Thanks.

The reason behind not posting the literal code in the post was that finally I took the pains to register a new GitHub account for the purposes of experimenting with the R language to present chess statistics. The literal code for the app can be found here:

https://github.com/chessstats/fideplayers

InfiniteFlash

Here's a screenshot of what I have. I spent a while trying to manipulate the data.

I hope this is correct. I don't think I will do percentiles, but I will do titled player statistics when I have time.

 

watcha

If you have the average ratings per age group, then you are very close to be able to create one of my favourite stats: the young talent list. Calculate the rating surplus for every player as the difference between their actual rating and the average rating of players of their age and gender. When every player is assigned a rating surplus, order the players by rating surplus and present this list as the young talent list. This is a very interesting list, because even 10 year old players have a chance to get into the top 10. Is some sense you can see into the future looking at this list, ten years later the 2700chess.com list may look something like the young talent list now.

You have to get something similar to this list:

https://www.chess.com/forum/view/general/top-50-players-to-watch-out-for

InfiniteFlash

#1 on the list is a less-well known name. Interesting stat that I will add now. Just added it now. Thanks for the suggestion. I also will include a standard deviation variable as well.

 

 

John Michael Burke came out of nowhere on this list. He's a chess.com member that goes by JMB something, i forget.

 

Heres the the standard deviation table of how far each of the top rated guys are away from their group mean.

 

Anything above 3 means pretty much means you are the cream of the crop for your age/Gender group with respect to the population.

First place player is a person I've never heard of, must a super-prodigy. 4.3 is an incredible amount. Ridiculously strong.

watcha

I tried to post a table in an other thread listing countries by the number of their super GMs. It looked a bit awkward because country codes were not translated to country names.

Finally I have found a near perfect solution which can convert almost all country codes ( unfortunately it cannot convert 'ENG' which is an important country ). As I won't upload this code to GitHub, this little cookbook is just as a reminder for myself how to add converted country codes as a column to a data frame, select the desired columns from that dataframe and then rename the column headers to human readable column names:

InfiniteFlash

I posted about the countrycode conversions earlier, haha yep looks good. There's a few countries other than the UK that don't get translated over that well too, no way around it other than hardcoding the renaming processs unfortunately, as you mention.

watcha

I think there is no better solution than to first manually replace ENG with GBR and then apply the conversion. However we are in R, so it was a bit counter intuitive why the naive try that would work in any other language generates unexpected result. It has to do with levels. You can't just replace an element in a column that is a factor. It sounds all chinese to me, but finally I managed to copy paste a solution to my code:

some more toolkit items:

http://www.magesblog.com/2011/09/accessing-and-plotting-world-bank-data.html

http://data.worldbank.org/indicator/all

to divide a column by a column named by variable 'what':

ext$SGM<-ext$SGM/ext[,eval(quote(what))]

to order rows by values in column named by variable 'bywhat':

ext<- ext[order(-ext[,eval(quote(bywhat))]),]

to divide a column and round it to some fractional digits:

ext$pop<-round(ext$pop/1e6,2)

hash lookup using qdapTools:

extgdp=lookup(ext$country,data.frame(wbdata$country,wbdata$gdp))

use raw html:

http://shiny.rstudio.com/articles/html-ui.html

send message to js:

shinyServer(function(input, output, session) <- add session!

session$sendCustomMessage(type = "myType",message = list(input$select))

receive message:

<script>$(document).ready(function() {

  Shiny.addCustomMessageHandler("myType", 

    function(message){document.getElementById("titleline").innerHTML=message[0];});

})</script>

https://ryouready.wordpress.com/2013/11/20/sending-data-from-client-to-server-and-back-using-shiny/

complete cases + linear regression:

ext<-ext[complete.cases(ext),]  

ext$fitted<-fitted(fit <- lm(GM ~ gdp, data=ext))

rTools for 3.2:

https://cran.r-project.org/bin/windows/Rtools/

create R package:

http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/

InfiniteFlash

Be wary of hosting anything on shinyapps by the way.

You get about 25 hours of computational time. If you ever build an app that has more features, then the less total time you have. I only managed to host mine for 6 hours before all of mine was spent.

watcha

I created a package for downloading the Fide players list in XML format and converting it to an R data table in text format:

https://github.com/chessstats/players

This is a full blown R package, so you can install it in the usual way using devtools::install_github("chessstats/players"). For installation/user guide visit the GitHub repository in the link and look at the ReadMe.

InfiniteFlash

Nice job creating that! It should be extremely useful for those who don't want to go through the hassle of formatting the nonsense. Is the code only unique towards that specific FIDE XML file? 

Not to be a debbydowner here, but today I was quite annoyed as well when I found that there a way of customizing the way of importing a text file in R.

http://stackoverflow.com/questions/14383710/read-fixed-width-text-file

Really nice universal way of importing files and similiar to sas's input variable function within a data step. Easy 3 or 4 liner of code that just imports the dataset, even in its terrible format.