PLEASE FIX SERVERS

Sort:
x-0915631354

I've had it with these lost connections, time-outs and clock lags... It's not my connection, I never have this kind of problem on other websites.

GeorgeWyhv14

Yes. They should fix their servers. I lost due to poor connection, and clock lag. Otherwise, chess.com did a great job. I would give them 4 Stars.

Martin_Stahl
M0RE64 wrote:

I've had it with these lost connections, time-outs and clock lags... It's not my connection, I never have this kind of problem on other websites.

 

 

There are problems that can occur client side and ones that can occur due to issues between you and the site. For the former, check out the following:

https://support.chess.com/article/213-how-do-i-fix-my-disconnect-lag-issues

x-0915631354

As mentioned before, the issue is not clientside. It's a pity because the interface on the website and in the mobile apps is really state of the art. If only those connection issues would finally be taken more seriously...

Martin_Stahl
M0RE64 wrote:

As mentioned before, the issue is not clientside. It's a pity because the interface on the website and in the mobile apps is really state of the art. If only those connection issues would finally be taken more seriously...

 

If the problem is not client side, it is very often between the client and you. There's only so much that can be done about issues like that. From reading some articles about the site on other sites, I believe the site is working towards a way to help with that issue, but I have no idea how far out it actually is from realization.

Martin_Stahl
Turkish_Emperor wrote:

I'd also like to add to my above post that when I watched my video recordings of my games often my opponents clock would INCREASE in time!!!. Whilst mine would decrease in substantial jumps, like 7 or 8 seconds.

 

https://support.chess.com/article/423-why-did-the-clock-times-suddenly-change-the-clocks-seem-broken

 

That is not uncommon when experiencing lag and/or short disconnects. On a mobile device, if you have both wifi and data, I would suggest switching one off while playing. The device itself will try to use the best connection and could switch during a session, maybe multiple times, causing packet drops and require resending data both of which cause lag.

cyberloo
I am fed up from the same issue of servers losing connection and me losing due to it. The game timer should freeze for both opponents when it senses poor connection but it doesn’t. Makes it frustrating to lose when you’re winning.
MindTheTrap

same issue here when I play classical games (30+0).
This issue of disconnection arrived few weeks ago as I never had any issue before.
As I am a software engineer, I run now a powershell shell script that tests connection to chess.com and google.com at the same time. When game disconnection does occur, the
chess.com site is not reachable although google.com site is . This is clearly a server side issue.
Please fix it ; this a non functional requirement for a paid service. 

MindTheTrap

that's really sad because I find the whole experience rather good but this is really pain in the back.
This kind of problem is not impossible to fix ; We don't see lichess and other chess sites

having this issue and we are seeing web services that allow to edit/draw/interact on complex documents (drawings, schemas , word docs etc) on a collaborative and distributed way without any hiccup. This ix clearly fixable ; And please support team, do not suggest to improve our Ineternet/wifi connection. I can make myself available if I can help to troubleshoot.
     

Martin_Stahl
MindTheTrap wrote:

same issue here when I play classical games (30+0).
This issue of disconnection arrived few weeks ago as I never had any issue before.
As I am a software engineer, I run now a powershell shell script that tests connection to chess.com and google.com at the same time. When game disconnection does occur, the
chess.com site is not reachable although google.com site is . This is clearly a server side issue.
Please fix it ; this a non functional requirement for a paid service. 

 

The sites are not in the same data centers and your traffic does not travel the same path. In that particular case, unless the site was having an outage, the problem was likely somewhere along that path and not necessarily with the site servers.

 

In the case of an issue between you and the site, there's not a whole lot that can be done. I believe the site is working on a more distributed play experience, so games might be geographically closer, and therefore faster. However, I have no idea how far along that might be. I read an article online about it, that seemed to indicate that was a direction they are going. However, even then, if there is a connection issue between you and where that is hosted, you can see the same thing.

 

MindTheTrap

I am sorry but that explanation does not hold at all.
If there are 2 or more data centers, that is for managing high avaibility and this kind of scenario should be covered. 

Again, there are bad ways of implementing things and there are good ways.
Looks like they've chosen the bad way.  

With  a pure HTTP RESTFULL Stateless api, you would not care what route the http packet would take. The it/software industry knows now how to make up for eventual "routing" issues.
If websocket is used, the http connection stays opened for both parties until one of them drop it. I cannot imagine there is a piece of code running on the browser that would stop the connection while playing so the connection cut to come from the server side.
I won't design their solution here, but that's way to simplistic to say it's because of internet routing. There are ways to handle interruption and design a eventually consistent system (CAP theorem) .


 

MindTheTrap

some more elements to consider :
- packet loss is a possibility but not at this rate. That's huge ! They have to change providers if this is really the case. But I doubt it because infrastructure in general is way more stable than software. In a world of Netflix, amazon prime and so on, the world wide infrastructure is quite reliable. Even if  IP packets do take different routes, TCP protocol is there to manage correctly connection beetwen clients and servers.
I think the issue is actually "bigger" than our browser disconnection.
As I mentionned, I now run a simple powershell script (see below) that tests the connection to chess.com and google.com every time I play now. When I get disconnected while playing, this independant script is also failing to reach chess.com but not on google.com.
So it's not the browser fault, nor my connection because I can still reach google.
I suspect it's a DNS or Load balancer issue or anything "software" on server side.

 

##### basic script-

While(1)
{

Test-Connection www.chess.com -count 1
Test-Connection www.google.com -count 1
Start-Sleep -Seconds 2
}

 

 

MindTheTrap

Martin_Stahl : are you from the 1st level support team ?
I'd be pleased to have a chat with one of your solution architects or any technical guy who took

part of the design of the solution. I am a 20 years experienced software engineer and solution architect.
 

 

Martin_Stahl
MindTheTrap wrote:

I am sorry but that explanation does not hold at all.
If there are 2 or more data centers, that is for managing high avaibility and this kind of scenario should be covered. 

Again, there are bad ways of implementing things and there are good ways.
Looks like they've chosen the bad way.  

With  a pure HTTP RESTFULL Stateless api, you would not care what route the http packet would take. The it/software industry knows now how to make up for eventual "routing" issues.
If websocket is used, the http connection stays opened for both parties until one of them drop it. I cannot imagine there is a piece of code running on the browser that would stop the connection while playing so the connection cut to come from the server side.
I won't design their solution here, but that's way to simplistic to say it's because of internet routing. There are ways to handle interruption and design a eventually consistent system (CAP theorem) .

 

 

The live server process only runs out of the primary data center, or at least it only runs in a single location and is a singular process at this time.

 

The site started in 2007 and live was added not terribly long after. When designing at the beginning, the need and standards were different.

 

The site is continually upgrading, increasing availability and reliability, and as mentioned, I'm pretty sure they are working towards a disributed architecture for live and likely have been for a while.

 

Legacy systems and processes take time to upgrade, test, and implement. 

 

Regarding disconnects for the live server, if there is a disruption between your client and the site, it can take time for that to resolve. The site also only allows a maximum amount of time for reconnection. There's also a chance there are some bugs in code that don't clearly recognize some types of disconnects and fails to re-initiate the connection.

Martin_Stahl
MindTheTrap wrote:

Martin_Stahl : are you from the 1st level support team ?
I'd be pleased to have a chat with one of your solution architects or any technical guy who took

part of the design of the solution. I am a 20 years experienced software engineer and solution architect.
 

 

 

No, I'm a volunteer moderator. But the design of live is older, though certainly updated a lot, and as far as I'm aware, the site is actively working on the architecture and know what they are doing.

 

If they have specific needs that the team isn't currently met with existing team capabilities, they're likely hiring for it and/or learning the needed skill sets. I know they site has grown a lot over the past couple of years. 

Martin_Stahl
MindTheTrap wrote:

some more elements to consider :
- packet loss is a possibility but not at this rate. That's huge ! They have to change providers if this is really the case. But I doubt it because infrastructure in general is way more stable than software. In a world of Netflix, amazon prime and so on, the world wide infrastructure is quite reliable. Even if  IP packets do take different routes, TCP protocol is there to manage correctly connection beetwen clients and servers.
I think the issue is actually "bigger" than our browser disconnection.
As I mentionned, I now run a simple powershell script (see below) that tests the connection to chess.com and google.com every time I play now. When I get disconnected while playing, this independant script is also failing to reach chess.com but not on google.com.
So it's not the browser fault, nor my connection because I can still reach google.
I suspect it's a DNS or Load balancer issue or anything "software" on server side.

 

##### basic script-

While(1)
{

Test-Connection www.chess.com -count 1
Test-Connection www.google.com -count 1
Start-Sleep -Seconds 2
}

 

Again, and if there's a problem between you and the site,  but the site is working successfully for millions of other users, you can see packet loss and there still not be a server or datacener issue. 

 

The site does occasionally have major issues and that will manafest in 502 errors and/or a live server crash. The other day there was a brief, but major outage. Those things can happen, even to Netflix, AWS, Google, Microsoft,etc, and those companies are much larger, have much larger budgets/revenue, and much larger teams.

MindTheTrap

Well, well, there's no point to argue further.

For a paid and old service , that is simply not acceptable. This is the kind of bug you would eventually accept from a young platform but not from a site that is 15 years old !!

I am pretty sure you are big enough to manage this kind of issue, given the number of users

and the cost per users.
This is a non functional requirement that is obvious , beyond party analysis, chat and so on.
This is basic stuff.

Your non technical explanation might work for non IT literate but not for me.
That is so bad.

Martin_Stahl
MindTheTrap wrote:

...

Your non technical explanation might work for non IT literate but not for me.
That is so bad.

 

While my description may not get into the technical weeds, it isn't technically incorrect . I'm in IT as well, and understand how things work. Also, as mentioned, larger companies, some running longer, and with vastly superior resources, suffer issues and outages. That's part of larger, complex products and interactions with people and processes.

 

I won't say the site doesn't have bugs or that the implementation is as robust as it can be. However, I rarely see any connectivity issues. I have seen that the site has been growing rapidly over the last couple of years, has been working hard to increase reliability,  and has  investing both in resources an technology to make things better. 

 

I expect thing will continue to grow and get better, though there will likely continue to be growing pains.

MindTheTrap

Big companies have only big outage, not tiny sporadic connection issues.

Comparison is not reason. 

I am not not the only one to experience this issue and simply proved that is a regular problem on "chess.com" side. It's not the Internet fault or any intermediate operator.
Googling for this issue, I see that it's not the first time and has already been there for long. Could also be some sporadic production releases that break the production, that would not be the first time in history.

 

Martin_Stahl
MindTheTrap wrote:

Big companies have only big outage, not tiny sporadic connection issues.

Comparison is not reason. 

...

 


No, big companies can have partial outages that only impact a subset of users and bigger outages as well. The comparison is to show that companies of all sizes have issues and it isn't out of the ordinary. 

 

As mentioned, I rarely have connection issues and when I do, it's often my device that is the root cause (I play mostly on mobile web). I'm not saying there isn't room for improvement, there always will be. And from everything I've seen, the site is working on improving things in all aspects.