Handicapping the FIDE Grand Prix

Oct 18, 2014, 1:49 PM |

I'm interested in a small project to do some assessments as to who will be the winner of the candidate's tournament.  Let me know if you are interested, I'm going to start a github and start working on this after the second tournament is complete. 

Steps for a statistical algorithm:

1. Gather data.  http://en.wikipedia.org/wiki/FIDE_Grand_Prix has good data on the previous two Grand Prix events, although those rules were different.  It also has information on the Women's Grand Prix, which might provide valuable data.

2. Assign prior probabilities.  There are formulae out there to assess chances of winning a tournament based on ELO, which will form a good probability.  Anyone with experience with that would be helpful.

3. Learning algorithm: How do we assign a probability for each person given the data?  I'm thinking about using Naive Bayes or logistic regression, but there are plenty of ML packages out there.

4. Implemention: I'll likely be coding this up in Java, since I'm most familiar with that language, and attempting to make the framework generic enough to use with other tournaments (and even other sports with head to head matchups).

5. Special Issues: The format of the tournaments allows for imbalances based on number of times playing white and black.  Since each pair only play once, the disposition of colors for those games is important.  For example, if Gelfand plays Caruana with white in a tournament, that may be a significant enough advantage to sway the chances that he win's the whole Grand Prix.  Since there are only 3 tournaments per player, that means every pairing is likely to be imbalanced.  Also, who sits out which tournament is very likely to make a difference.  Caruana and Gelfand are both sitting out the Persian tournament, which may hurt their chances.

Likely there are many more factors that can be measured, so any ideas for those would be helpful as well.