There have been some excessive requests recently, from some individuals, which resulted in staff throttling the download rate.
For a large data mining adventure I would suggest that an approach to staff in advance would be best. They may be able give advice on speeds and best times for access.
Hello,
I'm wondering if it's possible to obtain a large collection of games for AI research. I'm training Leela Chess Zero (Lc0) networks based on human games to experiment with various learning methods.
For past months I have been data-mining the game archives of Lichess which (at the time of writing) has around 1.6 billion games. As Chess.com has games archived since about 2007 I'm very much interested to know if there are any compressed downloads available for this kind of research.
Lichess has game archives per month which contain (for the recent months in 2020) up to 70+ million games per download. This makes it easy to data-mine larger data-sets.
For me it's key to get a large portion of games in various rating ranges and time controls (preferably no bullet games):
I have been reading the API documentation lately and apparently for serial access the API should be unrestricted / uncapped but I have seen 429 responses every now and then. Game archives on chess.com are per-player and also on a monthly basis.
To my knowledge there is not really a fine-grained way to obtain players in a certain ELO range and thus this would require some 'probing' to find the right sub-set of players (e.g. for example by iterating over the player listings for each country).
I hope to hear some directions how to achieve above goals.