Strange Match Endpoint Result - Help Appreciated

Sort:
skelos

Andrea and @bcurtis: No particular criticism implied. Software always has bugs, and new software more bugs. Nothing I hit yesterday stopped me completing what I wanted to do, for example, so api.chess.com is definitely functional and has been available very reliably.

Every different user shakes something out.

stephen_33

I agree, particularly about the functionality of the c.c API - I've found it very robust & not had a single failure in any request I've made (other than for reasons of sloppy coding on my part!).

But I'm starting to form an idea about improving the way in which club/team names are obtained which would solve two problems I have: The obvious one that's cropped up here of mis-match between web-page names & that in the endpoint data but also the problem of clubs changing their names during a tournament but without informing us.

One of our teams in TMCL made a minor change in the spelling of their name - This is Norway changed to This Is Norway - spot the difference? It gave me a headache for a while.

So I've been wondering what can I use as a proxy for a club name that reliably identifies that club & its content such as team matches? The only thing I've come up with is to use one of the club's matches because that solves the problem of World's Best Chess Players & also name changes because the endpoint reflects the new name.

It's a pity though that there isn't a unique club reference/club_id in the way the site has unique player id's. For what I'm doing I need something that's immutable.

So I might stop using team names in my input file & switch to any suitable match id instead, in the absence of anything better. Then I make a request for that match endpoint & derive the current club name from that. Do you think that would work o/k?

skelos

+1 on wanting immutable club IDs.

I am also sad but optimistic about the caution in the documentation that player_id may not always be provided. As names may be (and have been) reused we really do need player_id.

stephen_33

I wonder if there's an issue of providing site member data to those of us with API access but not to those without; that's to say members who only view web content? Although anyone can view another member's player endpoint information in an ordinary browser, if they know what they're looking for.

But while a member may change their username only once, admins can change their club's name as often as it pleases them & that can become a headache with certain clubs.

skelos

I suspect the "only one change" is policy and not technical. Staff can and have changed names of accounts.

I noticed the club name change when France/Deutschland Group became France-Deutschland Group. I am not sure what technical issue led to the change, but @DJ_Haubi's message if I recall correctly suggested it was to avoid some problem.

In that case, the name changed but the web (and @id) URLs didn't.

stephen_33

The set of rules that the site uses to derive club URL's is simple & predictable but also slightly bizarre. I looked into this when I was writing my script for the Knockout tournaments & wanted to derive my own URL's in order to provide links to all clubs in the draws I post.

For about 95% of all clubs it's simplicity itself - render all case characters into lower case & replace spaces with hyphens. Then append that string to the URL stem of 'https://www.chess.com/clubs/forum/view/' & you're there.

Then you have the less common characters such as '!', "'" & "." which are omitted from the URL. After that '&' which is replaced by 'amp' & a few other rarer characters that are substituted.

But the really intriguing cases I found are those where it seems to be impossible to predict the outcome, by any set of rules you apply. For example:-

Захід is rendered as 'content102'
Шах Србија is 'content209'

I assume those have to be set with manual intervention perhaps or if the site algorithm can't convert the name into a URL format within the rules it allocates the next 'content' reference?

But after those come even stranger cases & you might recognise this one  wink.png:-

Asger's Great Viking Warriors becomes 'asger-s-great-viking-warriors'

Normally the single quote ("'") is omitted but not in a  very few cases such as your group. I haven't a clue why. It's as if the algorithm has changed at some point.

It's to cater for those exceptions that I have a separate input file containing club names & their URL equivalents. If my script can't locate the club name in that list then it defaults to deriving the URL by the rules I've described.

skelos

Thanks for the information. I've not investigated as deeply as you have, but offer two examples that are also "interesting" (for programmer, corner-case-focussed values of interesting):

  1. NAMASTÉ is https://www.chess.com/club/namasteacute and a search for clubs won't find them if the final É is included. That's a website bug
  2. ΙRANIAN (first character is Iota, Unicode U+0399 I believe, and good luck finding it via a club search too but it's at https://www.chess.com/club/iotaranian.

Then one day one of us can figure out what @MGleason needs to change in PGN Spy to have it be able to display Unicode in match names and read (and write) files with Unicode characters. I believe MS Windows can do all that, at least mostly, but it's obviously not simply "out of the box" functionality.

Then let's not get into the PGN specification needing updating to specify UTF-8 not whatever Latin1 alphabet it does specify, and which chess.com gleefully ignore (and I think rightly so) by putting UTF-8 data into the headers).

bcurtis

The website displays the same data that the API returns. If you look at the HTML source of the web page in question, every instance of the club name has the double spaces before and after the hyphen, just like the API response. The problem you are running into is that your browser changes the data to display it. Some browsers may convert the text to Braille, or automatically translate it — we simply cannot promise that the name as displayed by your browser is what we send in the API response. We can only promise that the data we send will be the same in these two cases, and they are.

As a side note, we tried to be consistent with the naming of the data elements, and "names" are user-entered data and unreliable as identifiers (including username!). No two clubs may have the same name at the same time, but the names can be changed and if "My Best Club" gets renamed to "The Best Club" then a new club can come along and be named "My Best Club." The reliable identifier you want to use is the club ID, given in the profile endpoint. You should only use "names" for display purposes.

skelos

Thanks @bcurtis. I'd looked, but either I didn't look well enough or you sneaked it in:

https://www.chess.com/news/view/published-data-api#pubapi-endpoint-club-profile

  "club_id": 57796, // the non-changing Chess.com ID of this club
bcurtis

I made that post before seeing that you continued the conversation on a second page. It looks like you saw the problem in your approach that I was pointing at.

> player_id may not always be provided.

The player_id will always be provided when we have it, however we are in a state of changing our ID systems. An ID you gather today should always be available; later in the new system, a player will have a new ID. All players will have a new-format ID, but only players who started their account in the old format will have the old (current) format of ID. In this case, the new players will be missing that.

When IDs change, there will be announcements here and a deprecation process that should be straightforward to follow.

stephen_33

bcurtis, I understand that but strictly speaking, the removal of 'redundant' spaces (i.e, more than one together) isn't a feature of my particular browser as such, it's part of the HTML general specification. At least that used to be true & I don't think it's changed in HTML5. All browsers remove unnecessary spaces.

But what do you think of my suggestion to add a cautionary warning to the API descriptions page, that names at least on any web page may not tally precisely with the same endpoint data?

But there's a club id? That's great news & exactly what I need - I hadn't spotted that before.