Connection caching: A Good Thing

Sort:
skelos

Being aware that SSL negotiation can take some time, I dug around and found out how to request connection caching when using perl and its LWP module.

For perl, it turns out the trick is to add:

use LWP::ConnCache;

and once you have your user agent:

$ua->conn_cache(LWP::ConCache->new());

 

A reference is:

http://search.cpan.org/dist/libwww-perl/lwptut.pod

 

In very casual testing those two lines cut my download of a member's game archive from 59s to 24s, each time the average of three requests for a user with twelve months of archives, with gaps. (Ten previous months, the current month, plus current (unfinished) daily games.)

The CPU time this end doesn't matter; I don't know if it does or not to chess.com at their end. Quite possibly not.

am pleased with the reduction in wall-clock time, which will only increase if I'm digging at teams with ~1000 players and then need to look up the player profiles, or working through a particular nationality and need some statistics for a player.

bcurtis

Take a look at HTTP/2, which should allow you to send multiple requests over the same pipeline, even in parallel. The new rate limiting code on our side should permit more of these types of parallel requests — just be on the lookout for a code 429 response, then wait a moment and try that URL again.

On our end, we put a lot of effort into making this a lightweight process to run. We're serving over 20 million requests per day (much of this is our internal use), and the servers hardly notice. So don't worry about that! If you figure out a way to make our servers cry for help, you'll get a bunch of 429s and we know where to find you to figure out what happened.

skelos

Thanks, @bcurtis. I do make sure to set my user agent to something sensible (my chess.com username seems sensible) so that if something goes wrong I'll be both easy to block and easy to contact.

As yet I've nothing I care to run parallel requests for, but I appreciate having the option!

skelos

Parallel requests means a little more application side complexity. For something interactive, I can see it being very helpful. For basically batch operations, simpler is better until it's too slow. IMHO at least. happy.png

(My opinion being heavily influenced by the times programming hasn't been my main job, but something done "on the side". Naturally, critical bug reports or changes in the environment such as a data source format "update" would occur when I was busy with my primary work. "As simple as possible, but no simpler" were words to live by.)