Downloading more recent data, last month for example (78,000,000 games and over 100 GB) is desirable though I had to purchase more iCloud storage to even fit it in my computer.
Cool! I downloaded a large text file in PGN format (about 25 GB) from lichess of all games from April 2017. Unfortunately, I can only take a relatively small sample of games because the runtime is so slow. Do you have any idea of how I get started with PySpark or some other method of handling all this data?
Very cool! I'm the Head of People Analytics for a big Pharma. I'd be glad to answer any questions you have about doing business analytics and data science in the real world.
Hey! I'm a data science student currently working with lichess game data. As I perform analysis and figure out PySpark to I can work with more games, I will post stuff here :)