Serious fault with API. Left for over two days unattended.

Sort:
Tricky_Dicky

Seems to be intermittent. Some now work, others still return "code 0"

stephen_33
ImperfectAge wrote:

Well, don't forget the 500 error when trying to query the list of players from large countries like USA

I think that may be more of a computing-resources problem than a human-resources one and may require investment in new hardware/software that the site isn't prepared to make?

So I wouldn't expect it be fixed any time soon, by which I mean in this decade.

stephen_33
Tricky_Dicky wrote:

Seems to be intermittent. Some now work, others still return "code 0"

You're right, these work fine...

https://api.chess.com/pub/club/levice/matches

https://api.chess.com/pub/club/icy-blue-knights/matches

https://api.chess.com/pub/club/tough-and-social/matches

https://api.chess.com/pub/club/deep-space-9/matches

https://api.chess.com/pub/club/team-donetsk/matches

https://api.chess.com/pub/club/hikaru-and-buddies/matches

but this doesn't...

https://api.chess.com/pub/club/mykolaiv-chess-club/matches


It seems to be mostly fixed?

bcurtis

These endpoints are cached for just under a day. If the data change, they will be updated sooner in many cases. The `/pub/club/mykolaiv-chess-club/matches` URL should expire in 6 minutes, so I'd like to let it expire and verify that it get re-created properly.

If you discover other URLs that return an error after about 6 hours from now, please share them here. That indicates something else is going on.

This API was intended to help cover us for a few years while we built a better one. Last year, interest in chess grew too quickly, and now this API is stressed with some large responses and we do not yet have the resources to replace it. We do expect things that currently work should continue to work, and will treat anything otherwise as a bug.

Yes, bugs get tickets. Many of our forum moderators are outside of that ticketing process. I talked with Martin, and he told me that he was informed a ticket was made but was not told the ticket number and could not look it up. He shared what he knew. We will try to get our support and bug-triage teams to share ticket numbers when they can.

Tricky_Dicky

Thanks Ben. The link https://api.chess.com/pub/club/mykolaiv-chess-club/matches now works.

As today goes on we should expect all links to refresh I assume.

ImperfectAge
stephen_33 wrote:
ImperfectAge wrote:

Well, don't forget the 500 error when trying to query the list of players from large countries like USA

I think that may be more of a computing-resources problem than a human-resources one and may require investment in new hardware/software that the site isn't prepared to make?

So I wouldn't expect it be fixed any time soon, by which I mean in this decade.

It would be sensible to provide a way to segment the data, that is to request a range or 'page' of the results so it's not going to be required for any server to hold the entire list in memory.

stephen_33
bcurtis wrote:

...

Yes, bugs get tickets. Many of our forum moderators are outside of that ticketing process. I talked with Martin, and he told me that he was informed a ticket was made but was not told the ticket number and could not look it up. He shared what he knew. We will try to get our support and bug-triage teams to share ticket numbers when they can.

That would be immensely helpful (sharing ticket numbers with members) if only to keep track of a particular bug report. I think one or two of mine have simply got lost over the last year or so.

stephen_33
bcurtis wrote:

...

This API was intended to help cover us for a few years while we built a better one. Last year, interest in chess grew too quickly, and now this API is stressed with some large responses and we do not yet have the resources to replace it. We do expect things that currently work should continue to work, and will treat anything otherwise as a bug.

That's very candid - thankyou. It also explains some of the curious http error codes I and other users have received over the months, if the servers were being overwhelmed with endpoint requests.

One suggestion I can make is to split up the set of all club matches if that's possible? Most of the time I need only those matches in 'registration' or 'in_progress' and they usually constitute a very small proportion of all matches - would that be likely to reduce demand?

Tricky_Dicky

@stephen_33, the suggestion to split the team matches end point into two, Completed and registration/in progress is an excellent idea.

bcurtis

I tracked down why it took so long to respond here. We have some internal processes we are going to improve, but one thing you can do when reporting bugs is indicate the severity. The bug report that became a ticket merely looked like one person was seeing a problem with one team match, and because of that it was not prioritized properly. This was the full report:

> https://api.chess.com/pub/club/norfolk-knights/matches
> returns
> "An internal error has occurred. Please contact Chess.com Developer's Forum for further help https://www.chess.com/club/chess-com-developer-community ."

This is all factual, but the person reading the report did't know this was part of a community league system. A better report that might have gotten a response within hours would be:

> All team match endpoints in the Public API are returning errors. For example:https://api.chess.com/pub/club/norfolk-knights/matches
> These endpoints are a critical part of community league software from several developers, and serve hundreds (maybe thousands) of players per day. Please see the forum for more details: https://www.chess.com/clubs/forum/view/serious-fault-with-api-left-for-over-two-days-unattended

Like I said, we hope to get better at detecting these sorts of important things that can be solved quickly. My suggestion about the bug reports just helps us out.

Thanks!

Tricky_Dicky

I think that would have been my original bug report and email to support.

Thanks for the feedback Ben. I will try and be more specific in future.

The balance is obviously not crying wolf too often when it might be just an irritation instead of a serious DB problem. Having said that I did reference this forum and I suspect it wasn't looked at for quite a while.

stephen_33

Yes thanks, that's helpful.

But this episode seems to have highlighted a problem in that while the developers appreciate the importance of the API, some staff think it's of minor importance. Is there perhaps a job to be done of educating all staff involved in handling bug reports concerning it?

andreamorandini

First of all I would like to apologize to all of you for this unexpectedly long time waiting for a fix. 
On July the 28th we enabled a patch to fix the inverse sorting order of finished matches. Unfortunately it broke the responses for all teams with a live finished team match.

This kind of things happens, we know we can fail, so we have also some monitors in place which ring a bell when users starts to get error responses. But, as you may guess, alarms didn't trigger. And worse than no alarms is to rely on a broken alarm. 

Today we rolled out a fixed patch to solve the reverse finish order problem and that will start to be effective starting from tomorrow.

We are still conducting and audit to understand why the alerts didn't work as expected and we will enforce those monitors in order to minimize this scenario in future. 

stephen_33

Good to know the problem with the most recently finished 500 matches for a club is being addressed and fixed - something some of us have very much wanted.

Hopefully alarms will start ringing next time something like this happens.