Small snag - help appreciated - Chess Forums

Jun 3, 2022

0

#1

I'm using a routine written in Python to extract information (team names, game types, time control etc.) from (scraped) match challenges in a league I help with and it's been producing very useful results.

But I've just hit a small problem because the name of one team in a match that's being challenged appears as "ONE WORLD League PLAYERS' LOUNGE" in the challenge but my input file of teams has the plain english "ONE WORLD League PLAYERS' LOUNGE" (these are loaded into a dictionary). And that's giving me 'KeyError' every time I look the team name up.

This seems to be a simple case of one encoding not matching another and for now I've cured the problem by replacing all "#039;" substrings with "'" but is there a simpler way of doing it?

Some club names include quite exotic characters and I'd like to find a generic solution if possible.

acity609

Jun 3, 2022

0

#2

Are you using a BS4?

benslice

Jun 3, 2022

0

#3

In [1]: import html
In [2]: name = "ONE WORLD League PLAYERS' LOUNGE"
In [3]: html.unescape(name)
Out[3]: "ONE WORLD League PLAYERS' LOUNGE"

stephen_33

Jun 3, 2022

0

#4

acity609 wrote:

Are you using a BS4?

No, I'm doing my own parsing of the HTML doc. but I'll look into that when I have some spare time.

stephen_33

Jun 3, 2022

0

#5

benslice wrote:

In [1]: import html
In [2]: name = "ONE WORLD League PLAYERS' LOUNGE"
In [3]: html.unescape(name)
Out[3]: "ONE WORLD League PLAYERS' LOUNGE"

Thanks, that's really helpful and solves my problem 😊

>>> html.unescape("ONE WORLD League PLAYERS' LOUNGE")
"ONE WORLD League PLAYERS' LOUNGE"

Apart from modules such as time, Requests and json, I haven't needed to delve too deeply into the Python library but probably time for me to do some background reading.