I don't have an answer but am always wary in UTF-8 there are at least two "normal" forms. I gave up on Google's "Go" (golang) project to which I was a contributor because I couldn't in an application rely on comparing strings for equivalence. (They hand waved about library support, and may have done something since. As the "system programming" language also didn't desire to support all OS system calls, I returned to Perl/Python and C.)
I guess being yelled at/told off/disagreed with by Rob Pike is some sort of career achievement, but I'd rather he'd come to see my side of things and made his new language useful.
That's a bit off topic: point here is are all Unicode characters and strings always normalised the same way?
I'm currently involved in collating the results of the first RR in the TMCL 2018 tournament. The match endpoint API requests I've been making have gone without a hitch & I'm impressed at just how robust that system is. At present I'm not even bothering to trap errors relating to my API server requests & there hasn't been a single failure (out of several hundred), so kudos to the developers!
But I've hit a very small snag regarding the team name for the matches of just one of the groups taking part & it's this one:-
World's Best Chess Players - Лучшие Шахматисты в мире
I'm finding that the string I'm using, copied directly from the group's home page, doesn't correspond to the match endpoint name which is as follows:-
World's Best Chess Players - \u041b\u0443\u0447\u0448\u0438\u0435 \u0428\u0430\u0445\u043c\u0430\u0442\u0438\u0441\u0442\u044b \u0432 \u043c\u0438\u0440\u0435
I recognise the (Russian cyrilic?) unicode character codes used to represent the cyrilic part of the name but I don't understand why this should be giving me a problem. We have a host of teams with names that contain non Latin characters & they're all fine.
Another puzzle is this - when I do a comparison between the two strings using the command line of my Python (V3) interpreter like this...
"World's Best Chess Players - Лучшие Шахматисты в мире" == "World's Best Chess Players - \u041b\u0443\u0447\u0448\u0438\u0435 \u0428\u0430\u0445\u043c\u0430\u0442\u0438\u0441\u0442\u044b \u0432 \u043c\u0438\u0440\u0435"
..the result is True.
It's odd because my Python script is left treating the two names as if they're not the same but it appears to recognise them as the same on the command line, so what's going on?
* This team name involves the same process as the one above but cause no problems..
Захід
From a typical match endpoint for that group: "name":"\u0417\u0430\u0445\u0456\u0434"