Data analysis: Difference between Male/Female ratings

Sort:
watcha

Is this application open source?

I mean, can I view the .R files?

( For my part, every statistics that I have ever published on chess.com is open source, I linked the sources of all programs that I used to generate them. )

InfiniteFlash
[COMMENT DELETED]
watcha

Here is the raw data from which I created the 'Female participation in the function of age' chart:

https://chessstats.shinyapps.io/birthdaystats/

This data is auto generated and was intended for internal use, so the column names are somewhat cryptic, PARFR stands for 'female participation among rated'.

Take this as my Hello, World! shiny app.

InfiniteFlash
[COMMENT DELETED]
InfiniteFlash
[COMMENT DELETED]
InfiniteFlash
[COMMENT DELETED]
watcha

Now, as a complete beginner, as I was two hours ago, this is how you go about creating a shiny app:

First, you realize that R is a language serving the goals of statistical computing. RStudio, which is responsible for generating shiny apps, is a set of libraries, an IDE ( integrated development environment ) and a server ( that lets you publish your apps ) sitting on top of this language.

So your first step is to download the R langauge and install it.

The next step is installing RStudio itself. When you first log in you have to verify your account through the R console ( you are guided throug this process by the web page ).

Create a directory and in the RStudio IDE set this as your working directory. I naively created a directory called 'R', which works, as long as you don't deploy an app. It turns out that it has to have a name at least 4 characters long, and the default name is the name of the working directory. But don't worry, there is a solution:

deployApp(appName="birthdaystats")

To do something useful you need a dataset. Fortunately my data is stored in tabulated text files which RStudio understands. You can import such a file as a dataset. Be careful to save this dataset to your workspace directory, otherwise it won't be uploaded and what works locally won't work on the server.

Last you need some code. There is a source file ui.R which is responsible for creating the user interface and an other, server.R which renders the data and can respond to user requests, here is the minimal code I needed to create my table:

ui.R

server.R

the imported data looks like this:

InfiniteFlash
[COMMENT DELETED]
InfiniteFlash
[COMMENT DELETED]
watcha

When I looked at the Fide data the first decision I made that no text files thanks, they are useless garbage having no proper formatting.

The XML format has the advantage that it complies with a standard and a document at least has to be well-formed to qualify as a valid XML document. Yes, it takes some effort to learn XML parsing, but in Scala the syntax is very accomodative, you can write down XML literals as the part of code ( you can say something like val xml = <tag>text</tag> and this line of code compiles, then you can save this newly created xml object to a file ).

First I parse the XML file into a tabulated text file ( do some preliminary processing along the way, like collecting rating clusters, throwing away unrated players etc. ). All subsequent processing is then made in tabulated text files ( they are also saved as HTML but just for readability ). The format I use is almost csv, just the separators are not commas but tabs.

As far as secret societies are concerned, I like doing things in public, I'm not a fan of private messages and that I don't accept friend requests is a part of discipline, not that I have a problem with the person, what is thrown at me in public is thrown at me, I'm content with this as long as useful discussion takes place among the random noise.

watcha

Publishing the data: I would gladly do that, however the total size of the generated files is 281 MB currently. There is now way I can upload this to GitHub ( not even zipped ), there are technical limitations here, the size of the locally created .git directory will get out of hand.

If you download Java 8, Scala, my fideplayers/1.0 project, save the file downloaded from Fide as players_list_xml.xml in its root directory, then ( under Windows ) click on sbtrun.bat and type the command 'startup' in the console that opens up, all the files will be generated.

InfiniteFlash
[COMMENT DELETED]
InfiniteFlash
[COMMENT DELETED]
watcha
InfiniteFlash wrote:

Disucssions of topics and results of analysis should be shared here, yes.

Discussions of specific code however are another matter.

Yes, but this can also be done in public. At an open source site which I won't name but at which we are both members there are discussion forums of teams. You can create a public team centered around a topic, it will have a discussion forum, and you can be as specific there as you like. I don't know if there is an equivalent of this at chess.com. But explicitly private discussions I don't enter into.

InfiniteFlash
[COMMENT DELETED]
watcha

Finally I managed to put my hand on some historical data on female participation.

Data is only available since 2013. Earlier records do not contain gender information.

Here is the complete data:

https://gist.github.com/anonymous/f9041b819bb0a859cf9f

Overall female participation has increased in the 2013-2016 period, on the other hand participation among middle age and middle age active players declined.

( code: https://gist.github.com/anonymous/e78dc117e4c9bb8a0491 )

InfiniteFlash
[COMMENT DELETED]
watcha

Some of earlier records appear to have been filtered by rating. There are lists with no rating below 2000. It is risky to try to guess the gender statistics based on those lists because they are not comparable to full lists. It seems to me that in the end of 2012 there was a shift to more reliable recording, this was the time when the XML format was introduced and since then there is explicit gender information.

InfiniteFlash
[COMMENT DELETED]