Best practices in getting info with requests

Sort:
io_kloud

Developing a web app which I plan to deploy in heroku. I use python requests to get info from endpoints. Example.

import requests

headers = {
'User-Agent': 'head2head'
}
username = 'gmwso'
url = f'https://api.chess.com/pub/player/{username}/games/archives'
r = requests.get(url, headers=headers).json()

Is the header with user-agent enough in getting the info or I will just remove it or maybe add more like email address?

Sample app image.

dimpoul

Out of curiosity, how do you get the scores? Do you parse through players archived games, or do you use some private chess.com APIs?

io_kloud

I parse through the players archived games using this endpoint.

https://api.chess.com/pub/player/{username}/games/archives from the api

https://www.chess.com/news/view/published-data-api

I do not save the pgn but just read the game results, players for every game.

dimpoul

Thanks. Do these requests take a long time for players who have played tons of games like Hikaru? Or are you doing some sort of concurrent requests to parse the games faster?

io_kloud

Players with more games takes time to process. As an optimization given two usernames, I checked first which of  the two has lesser games (using the stats endpoint) then use that username to process the game matches. For example between gmwso and hikaru, gmwso has lesser games so I will process archives of gmwso. I am not using concurrency so far.

 

Time breakdown.

check number of games: 1.0s (to know which of two has lesser games)
get match stats: 53.4s
generate tables and charts: 0.7s (process the output from get match stats)
Search is done in 55.1s !

username: gmwso

 

The get match stats requests archive info of gmwso, and reading the pgn info into a python dictionary.

stephen_33
io_kloud wrote:

import requests
headers = {
'User-Agent': 'head2head'
}
username = 'gmwso'
url = f'https://api.chess.com/pub/player/{username}/games/archives'
r = requests.get(url, headers=headers).json()

Is the header with user-agent enough in getting the info or I will just remove it or maybe add more like email address?

When I started using the site's API a few years ago, I remember being advised by a member of staff (probably @bcurtis) to include my email address as well as some info' on the program requesting the data, something like:-

{'User-Agent': 'Chess.py (Python 3.7) (username: stephen_33; contact: ******@gmail.com)'}

I've just realised that needs updating because I'm using vers. 3.9 now. I think 3.10 has already been released?

But as a matter of interest, do you not check the status code of the returned data before applying the json parser? It's not unusual for endpoint requests to fail for one reason or another and I always check that the status is 200 before proceeding.

io_kloud

Thanks for the info and the hint on status.

Martin_Stahl

If you don't include an address, then if your code is using too many resources, they will block requests. With contact information they can let you know about the problem.

dimpoul

@Martin_Stahl When you say resources are you referring to the public web APIs (https://www.chess.com/news/view/published-data-api). OR the private ones (eg. https://www.chess.com/callback/member/stats/{{username}})

From my understading, we are allowed to use the public APIs as heavily as we want. But, we can only use the "private" ones on our own risk. Is this correct?

@io_kloud You might have to look into concurrency. I have a similar app that does the exact same thing and more using the game archives and it takes way too much time when players have over 100K games. For example, run the score between dimpoul and spennythompson on your app and see how long it takes.

Thanks!

WhiteDrake

I don't think that the public API may be used "as heavily as we want". Whenever admins of chess.com think that a given script is putting too much pressure on the servers, they'll block it. I think that chess.com is somewhat defended against dumb DoS attacks via the Cloudflare CDN, but if you try to do some heavy lifting, like parallel downloading all games played on Cc ever, I'd bet the script would get block eventually.

io_kloud

This is what I got so far.

username: dimpoul

numworkers: 1, archives: 74, time: 1141s

numworkers: 4, archives: 74, time: 152s

 

I will just create a numthreads option for concurrency in the app with default value of 1.

dimpoul

That looks much better @io_kloud, great job! I have implemented something very similar in Golang that gives scores between two players. I also use concurrent HTTP requests to mitigate the time issue. I notice that when the threads (workers) exceed 10 or so then chesscom starts giving non 200 responds to my requests. 5-8 workers should be fine and gives the response in a reasonable time. Thanks!