PGN Scraper

Sort:
QuietCorner

Hi all,

Probably been done many times before, but I wrote and published a PGN scraper after getting irritated that it still wasn't available in the UI.

-QC

SJCVChess

How very Pythonic of you. How does it handle rate limiting (people with more than 38-pages of games)? Errors/Exceptions? Or, just re-start if it breaks? But where do you restart before breaking again if rate-limiting isn't taken into account; I don't see anything in the config or otherwise to restart.

QuietCorner

Thanks for the tip! I wasn't aware that the API was rate limited. I'll have to look the rates up. When I hit the limits, I'll build in appropriate delays.

The code was written for my needs, and I'm glad to say it meets them. I haven't encountered any errors, but then again I wasn't writing with a view to making this widely usable for many use cases.

And "pythonic"? Sure, I mean, it was written in Python. Would you prefer a different language?

SJCVChess

Pythonic:

>>> import this

You did a very good job of being VERY Pythonic. But it doesn't address potential rate-limiting issues, resuming a stream.

Might I suggest 2 things:

  1. A file to track where to resume pulling games from the archive?
  2. Parameterization so that this can be run as hourly or daily cron job?

 

QuietCorner

Hi again! Thanks for your suggestions.

The chess.com API is not rate limited for serial calls, so I don't see a need for this script to keep track of progress, it is single threaded.

As for parameterization, I don't need it. The script can be run via crontab and the username parameters can be changed in the config.py file. 

You gave me an idea for something the script could benefit from. The assumption baked in is that all games for a specific user should be downloaded. Either using a parameter or by keeping track, it might be useful to have the script only download the newer games since a specified date.

Thanks!

skelos

I don't think I understand the problem or the solution. What's wrong with downloading via the archive endpoint, which gives you both games-per-month plus games underway?