stats (and possibly profile) endpoint enhancement requests for more efficient use of api.chess.com

Sort:
skelos

Hi,

One of the things I use the stats endpoint for is for checking/finding members to invite.

This turns out to be really slow.

a) Start with the profile endpoint for last_online time, join date and (sometimes) location

b) Grab the stats

c) Get group memberships for the total number of clubs

d) Go to their game archives to get the total number of current daily games.

 

Now, note in (c) and (d) I'm only wanting a count, not the actual data.

 

I think the total number of daily games would fit quite well into the stats endpoint. There are W/L/D records; any reason not to add number of in progress games to each of chess_daily and chess960_daily?

 

The number of clubs someone is a member of might fit better into the profile than the stats endpoint, but if it were available in either I could at least halve the number of endpoints I need to use for this task.

 

Thoughts anyone?

 

I'm working with what I have and trying to cache more data to speed things up, but oh is it slow to collect some of that data to cache. I probably should have been more selective about what members I even looked up, but the single-threaded rate of queries per minute (it's better measured in queries per minute, not queries per second) is pretty damn slow.

If there is intentional throttling (other than the number-of-connections thing) would someone please check that a zero hasn't been dropped? If I'm only getting 10% of the speed I'm supposed to be permitted I'd not be at all surprised. It's that slow, and pretty much across all the endpoints.

am using persistent connections (with https (SSL) an absolute must) and not using HTTP 2.0 (only 1.1) but compression of headers will not help; it's not a bandwidth problem or at least not a bandwidth problem anywhere near me, and the website has some chunky pages and is pretty zippy.

 

Thanks for reading,

Giles

stephen_33

I don't think it's just you Giles because I'm beginning to notice a maked slowing down in the speed of downloading endpoint data as well. Although that may have more to do with what I'm trying to do at the moment.

I've been developing a script to check the qualifying criteria of the clubs & their nominated admins that take part in TMCL & Knockout Leagues. One thing in particular left me frustrated because we specify a minimum club size of 25 members in TMCL. You'd think that was simple to obtain because club size is clearly stated on the web page. But not in any of the club endpoints as far as I can see.

So the only solution I've found is to download the complete members' list of each club taking part. Picking one of those clubs at random - France-Deutschland Group - shows 2666 members, so you can imagine the time it takes to download these sizeable data-sets.

All because there's no 'club size' field in the club profile endpoint.

skelos

I've not needed that, but I agree. In SQL if you've got access to the data you apply count() instead of asking for the data; with this RESTful (did I get the weird capitalisation right?) interface we don't get to ask for counts.

So ... some more count fields would help for both teams and members.

Speed-wise it's not bandwidth that's slowing me (as far as I can tell) it's simply latency in api.chess.com building the JSON to return. I did testing with and without a VPN, and also from a virtual server in the USA with excellent connectivity and as it was at a web hosting site, the site as a whole sends out much more data than it receives, so California to the USA East Coast should have been as good as it gets. There was no difference between it and from Australia using a VPN endpoint in another (non-USA) country.

 

But there are two things here:

1. Some counts (which should be easy and backward compatible) may reduce the number of endpoints accessed (my example) and data build/transfer (your example)

2. That the whole edifice is awfully slow.

 

#2 is not a showstopper, at all. Perhaps I should delete "awfully" but "slow" was my second or third thought after "Hey, this is great!" and it hasn't changed. Plenty of bugs fixed (thanks guys!) and neither I nor anyone else has been pushing performance, but I'm going to have to be as picky as I can to keep my script running times down.

I wanted to "catch up" some basic data for members whose names I've seen and then update it again if I see them again and the data I have is more than two weeks old ... I didn't expect my script to run for many days. As it's nearly done I don't want to stop now, although I expect a bunch of the data I won't need, but we're not talking volume here: having 1/4-1/2 of the data already and just now hitting 4GB (uncompressed) ... days? Ugh.

I am hoping that once I get the base (and I suspect too large) amount of data downloaded, a daily increment will not be a big deal for either me or for api.chess.com, especially (for me) as I'm running this stuff on a little Raspberry Pi now (which isn't the bottleneck; it's barely ticking over) but is great as a low-power always-on machine. Highly recommended for anyone who has some Linux experience and needs a tiny server. Now, if it had USB3 ...

skelos

I might add that where I want a count but need to read data (how many clubs does @erik belong to?) I keep the list of clubs. That's fast to reduce to a number when needed; if perchance I want the actual clubs, well I've spent the time downloading it so while I have any data on that user I'll keep whatever I've downloaded.

stephen_33

That seems sensible & I try to organise my programs in such a way as to minimise endpoint requests.

The trouble with the criteria I'm checking is that the club "created" date is supplied by the club profile endpoint but number of members can only be derived from the club members endpoint.

For admins taking part, things are a bit tidier. I need date of joining site provided by the player endpoint, whether they're a current admin of the club they're representing (we still find that nominated admins don't always belong to the clubs they're supposed to represent), whether they're members of the League club itself (TMCL or Knockout) & finally I check their "last_online" time (we still get admins who haven't been online for weeks). Once I've downloaded the set of league club members, all of that is provided by the single player endpoint.

But one thing's certain - it's so very useful (indispensable?) now, if the API facility didn't exist on this site we'd have to invent it!  wink.png

skelos
stephen_33 wrote:

...

But one thing's certain - it's so very useful (indispensable?) now, if the API facility didn't exist on this site we'd have to invent it! 

Yes!

Enhancement requests and complaints about speed ... I'm a fan, really I am. So many useful little scripts to answer questions. "Should we enter this rating limited competition?" Out comes a quick check of what members we have in what rating bands. So what if it takes a couple of minutes instead of a few seconds to run? (Answer in that case was "yes": we'd need a 25% signup rate which for that club is fine.)

 

The old, old story in user support is the developer saying to a support person: "My code's fine.  You never report any bugs in my code."

Support dude: "We don't get questions on your code. That means nobody's using your code."