I'd never used the arrows and don't even know what they are for? Are they used in FFA? (that is all I play).
Here is a suggestion to help you get more diagnostic info about the issue. How about adding a button/link to the interface, that activates when it is a player's turn -- and then deactivates after the player has moved (and the move has been registered by the server AND the client) -- then, if the player moves but it doesn't register in the client -- the player can click the button/link and that will run code that does two things:
1. sends to your diagnostic server some log information about the state of the game and network info, load issue, etc
2. refreshes the player's client (without forcing them to do an entire reload which can take 5-10 seconds).
I have reviewed the changes from ~two weeks ago a few times already, but there's nothing that would be obviously connected to the problem at hand.
btw to draw arrows, right-click and drag. this is just to find out if the client is stil behaving as it should. If not, this is indeed an indication for a client side bug.
as i've stated a few times, there are two possibilities. Either the move is lost somewhere on the way from the server to the client, or the client doesn't process it. I suspect the former.
Of course I also check the logs for anything suspicious that might relate to this problem, but haven't found anything.
What I occaisionally had happen too is that when i click a game, it get's stuck (the spinner in the board center keeps spinning and nothing happens), and I suspect this is the exact same problem. The game is sent but never reaches the client.
So i suspect it's a socket or network related issue, which are layers that are handled by third party libraries.
But the clearest indicator of a network socket related issue is that this only happens from time to time. A client side bug would most likely be the same for every move. It's possible that there is a certain constellation of circumstances that causes this behavior, but I have yet to recognize any pattern.
Taking into consideration other observations, like games occaisionally hanging (they get lost somehow), I suspect we are dealing with unreliable network sockets.
Perhaps an increased amount of traffic could contribute to it getting worse, and the only thing in the last release that added network traffic is the joining leaving of custom games. Hence the attempt to turn them off and see if it makes a difference. We tried that, and I could't make out any difference. Moreover, (and I have written all of this above already at least once) i noticed a huge difference from one day to another, while absouletly nothing changed! (eg we did not release new software, and nothing on the server was changed).
All this makes me think it's caused by something outside the 4pc app.
The server admin told us that the current hosting is "unstable". Socket.io (the third party library we use for connections) does also not have a very good reputation.
So I am looking to change both of these. A new server/hosting setup is in the works, and I am also looking into alternatives for socket.io.