Python

Sort:
stephen_33

I managed to obtain some useful data on time-outs before the start of round 4 in TMCL. Not as polished as I'd like but I can clean up & prettify the HTML later.

I've spent most of the past week, on & off, working through your version & reading up on the python doc's website. Heavy going at times for an amateur! When I first looked, I didn't understand much of your coding at all & I've noticed before that people who code for a living have very different style & methods to those of us who just 'tinker' about.

In the end I found it easier to start from scratch & code in my own way & the style with which I'm comfortable. It then turned out to be much easier than I expected.

The one part of your code I might have found most useful is the actual 'requests' block but I couldn't get it to work on my version. I spent a few hours trying to download the Requests module, getting nowhere. I remember reading somewhere that the download was suitable for python 3.4 to 3.6 which might be the problem because mine is 3.3.

In the end I used a section of code I found on Stackoverflow but what troubles me is that I don't fully understand what it's doing, beyond requesting/fetching the data from the server. It's this:-

.......

import json
from urllib.request import urlopen

.......

with urlopen(url) as response:
____for line in response:
________line = line.decode('utf-8') # Decoding the binary data to text.

 

url is of the form: 'https://api.chess.com/pub/match/' + match_id

Then I use the json parser to convert the data held in 'line' to a form that can be stored directly in a python dictionary & that part I understand. What puzzles me is what the variable 'response' is doing. It looks like a string or list but when I try len(response), all I get is an error message.

Any ideas?

What I did find most useful from your script was the try/except technique for capturing errors although I haven't used it in my own script, yet anyway. Up till now exceptions haven't been a problem for me because everything I've attempted has been very defined & strictly within my own device, so not much scope for things to go wrong, beyond my own silly coding mistakes of course.  wink.png

skelos

Not being a Python guru I scratched my head then searched the web. grin.png

The tricky part I think is that "response" is a generator, which you can iterate over as your code does. But for a given generator the length might not be known a-priori, so to implement len() might mean iterating over the generator and storing each result in memory in case you then do want to iterate over it. That could consume a lot of memory in some circumstances.

For a modest sized HTTP GET response it wouldn't, and there are hints here and there about libraries/modules that can do such counting for you, but bare-bones if you have a generator, you don't get len() as you would for say, an array because it might be too expensive to implement.

This section of the Python documentation looks to cover the idea:

https://wiki.python.org/moin/Generators

 

That's my best understanding currently; errors and omissions excepted. sad.png

skelos

Re the Requests module:

http://docs.python-requests.org/en/master/

https://github.com/requests/requests

...

Requests officially supports Python 2.6–2.7 & 3.4–3.6, and runs great on PyPy

...

 

Unfortunately if you have 3.3 and can't move easily, the Requests module is not for you, and worse I chose it not quite at random, but after reading about two or three HTTP client libraries it seemed well liked, I tried it, it worked (and is marginally nicer than the perl modules I use for HTTP and JSON) so I used it.

 

I've reviewed my code, and am pretty sure it should handle the exception it itself raises if a request fails, but it doesn't. What to do if a request fails isn't obvious: print an error message and exit I guess,  and that might be neater than just letting the exception go unhandled. I'd already given you the code by then though so I left it!

Otherwise the code is as idiomatic as I could make it. The headers=headers parameter to request() is perhaps not very nice, although it does seem to be the way things are done. The left hand side is a named parameter I think and the right hand side is the global dict I set at the top of the file.

It's global for no especially good reason; being a bit more prominent in case someone takes the code as a basis for something they might notice it and edit it to let chess.com know who they are would be a rationalisation but I probably copied it over from my perl scripts where I set a few things globally by habit when they are truly global to a script.

If you have questions post or PM me ... the regular expression compilation and use looks decidedly strange to me, but that's Python regular expressions for you. (Of course the compilation should only be done once outside the loop if we're going to be picky, for an unmeasurable performance improvement. happy.png)

What's obvious or not too confusing to someone who's done a lot of programming (and even more reading of code) in a bunch of languges (I couldn't give you an exact count) I understand can be perplexing to someone who as you say "tinkers".

I have always admired Larry Wall, the original author of perl, for insisting that knowing a subset of that language is fine. I'm paraphrasing, but he has said that if the code works and is done before your boss fires you, it's good. happy.png

Perl (perhaps less so now) was once the duct tape of the web, and all sorts of people who were not full time (or even officially part time) programmers used it to do "useful stuff" and still get their regular work done.

The feedback about your post with timeouts was very positive; job well done I say!

skelos

P.S. Your comment about exceptions is interesting. I don't like them; indeed they're my least favourite feature of Python!

stephen_33
skelos wrote:

Not being a Python guru I scratched my head then searched the web.

The tricky part I think is that "response" is a generator, which you can iterate over as your code does. But for a given generator the length might not be known a-priori, so to implement len() might mean iterating over the generator and storing each result in memory in case you then do want to iterate over it. That could consume a lot of memory in some circumstances.

.......

I think you're probably right & in Python that's called an iterable - is that a common term?

There're said to be a few iterables in Python although the only one I'm familiar with is range() but having tried len(range(4)) on the command line, it correctly returns the value 4! No error message for that. Response seems to be an iterable of a different kind.

What still puzzles me is the line...

for line in response:

That's exactly as you write the code to iterate over either a string or list, which is why I expected to be able to test its length. I need to do more reading on that & other aspects of Python & I'll take a look at the source you quote.

skelos

Generators are iterables but not all iterables are generators.

"Iterable" is somewhat standard: "How do I iterate over a list in (C, C++, Python, Perl)?" is a sensible question for each language. Whether the answer will include the term "iterable" is a bit trickier. Also one I'm not properly qualified to answer. sad.png

stephen_33

My Python version is 3.3 so I'm in a bit of a quandary over the Requests module. From what I've read about it already it looks a useful module but I'll have to upgrade to a more modern version of Python to incorporate it. I think that involves uninstalling my current version & downloading the latest, something I wouldn't relish doing.

Since I've found a method that works on my version, I may just soldier on for now, although I'll be a lot happier when I understand how it works & what it's doing.

As for requests-exception-handling, like you I've had no problems so far & the whole thing seems very robust. Of course if that changes, I'll need to consider incorporating it into my script as well.

"The headers=headers parameter to request() is perhaps not very nice, although it does seem to be the way things are done" - I wondered about that as well but isn't the value 'header' being passed as an argument? At least that's the way I've seen it described. For example:-

def my_function(name)
______print(name)

then I call it with ..

my_function("Stephen")

the literal (or variable) sent is the argument but the variable 'name' in the function itself, in which it's stored, is the parameter. So 'headers=headers' puzzles me too because as an argument, I'd have thought the simpler 'headers' would have served just as well because it's a global variable.

How important is it to identify who you are when making api requests, because I don't see any way I can with the method I'm now using - am I likely to start receiving terse messages from members of staff?  grin.png

"If you have questions post or PM me ... the regular expression compilation and use looks decidedly strange to me, but that's Python regular expressions for you. (Of course the compilation should only be done once outside the loop if we're going to be picky, for an unmeasurable performance improvement" - I quite like the idea of exchanging ideas & information here because it allows other users to 'look in' & comment if they wish.

But I'm not clear what you mean by the above (in bold)? Could you expand on what you mean there please - which loop do you have in mind because my routines usually involve several nested loops. And I generally dump all output to a text file because I'm usually creating content that needs to be HTML compatible, which is done as my script is executing.

I like the sound of Larry Wall - my kind of coder! My first concern is to produce a given result & I probably don't follow best practice much of the time. Because there're different ways of achieving a given result, I also haven't delved that deeply into the Python documentation.

That's probably why quite a few parts of your script were double-Dutch to me. For example the write to the sys.stderr file(?) was completely new to me & I have no idea what that's doing.

The assert keyword was also unfamiliar & I'm entirely clear what that does despite spending some time researching it. Keywords try/except were new to me as well & do look useful for handling exceptions - I will be using them from now on. I still have fond memories of the old BASIC 'on error gosub' error-handling command from 30 years ago.

But I'm pleased as well by most of the feedback in the TMCL time-outs topic & I'll definitely maintain that through future rounds & seasons. Hope I didn't sound tetchy there but I must have spent a good 20/30 hours of concentrated reading & coding on this project only for some members to (seemingly) disregard my effort altogether & that left me feeling a little peeved.

stephen_33
skelos wrote:

Generators are iterables but not all iterables are generators.

"Iterable" is somewhat standard: "How do I iterate over a list in (C, C++, Python, Perl)?" is a sensible question for each language. Whether the answer will include the term "iterable" is a bit trickier. Also one I'm not properly qualified to answer.

That makes sense - thanks. Do you know if we have any Python experts in the group, that I might message about this?

skelos

Stephen wrote:

"The headers=headers parameter to request() is perhaps not very nice, although it does seem to be the way things are done" - I wondered about that as well but isn't the value 'header' being passed as an argument

Parameter and argument are used interchangeably in my experience. If there is some formal, subtle difference I don't know it!

The trick with headers=headers is that Python supports named parameters (or arguments). The code is saying, the parameter "headers" should be the global dict named "headers". (Or possibly some sort of reference to it; doesn't matter here and off the top of my head how Python implements argument passing I don't know.)

For named parameters, here's one reference (yay, more reading!):

https://www.pythoncentral.io/fun-with-python-function-parameters/

 

Stephen wrote:

Of course the compilation should only be done once outside the loop if we're going to be picky, for an unmeasurable performance improvement"

This:

    for match_id in args:
        p = re.compile("^\d+$")
        m = p.match(match_id)

would be slightly better as:

    p = re.compile("^\d+$")
for match_id in args: m = p.match(match_id)

since p (the compiled regular expression) is the same for all arguments and doesn't need to be recomputed. Savings in CPU cycles: negligible. In more complex situations it might let a human reader see that the regular expression is the same for the whole loop.

 

Regarding your HTTPS library, see if there is a way to set the User-Agent string, but no, I don't think staff are going to come chasing you. The advantage of identifying yourself is that if there is a problem as well as blocking the script they know who to reach out to to explain the problem at their end.

Amusingly, when I used perl's default it was blocked (it isn't anymore for api.chess.com) because chess.com use a denial of service protection company and apparently many 'bots are written in perl. happy.png

 

sys.stderr is a Unix/Linux/OS X/MacOS thing: standard input, standard output, and standard error. sys.stdout is more-or-less print() but on Unix I can redirect standard output to one place and standard error to another, which can help pick out errors. Or consider it culture and simply "the way things are done" in my world. (My Windows programming was never console scripts or programs, and I did as little Windows programming as I could. Windows 2.x was not a thing of beauty.)

 

assert will conditionally throw an exception, and with the argument False always will if that line of code is executed. I really needed another try ... except block around all that code to deal with an incorrect option. Or perhaps just print a usage message and exit rather than throw an exception. I use code fragments from stackoverflow too. (Or the Python documentation; could have been either.)

 

To master Python you will need to learn about exceptions. It's obviously not too important when you're the only user and if something goes wrong an exception telling you on what line is good information to have.

My ISP is not as good as I would prefer, so I see network outages usually a couple of times a day. (And a lot more before I replaced my ADSL modem, so maybe the secondhand replacement isn't so good, but ADSL is about to go away "any day now" in favour of something newer and less stable. Not that I'm a pessimist!)

 

I thought your posts were well received, too. Wait two more rounds and then be three days late with the numbers and listen to the complaints then! That will be flattery, and peeving again. happy.png