Upgrade to Chess.com Premium!

A Week in the Life of a Chess.com Code Monkey

  • jay
  • | Dec 14, 2007 at 3:01 PM
  • | Posted in: jay's Blog
  • | 2017 reads
  • | 29 comments

I have a few reasons for writing this blog:

 

  • I thought the other Chess.com members would enjoy reading what goes on behind the scenes to make a website like this function.
  • It's therepeutic to get all of the stuff that is floating in my brain down onto paper so I can "Flush the buffer".
  • People have been wondering what I've been so stressed out with this week, and instead of trying to explain it to 10 different people over IM, I can just send them this url.
  • Finally, it'll hopefully act as a good technical reference for other developers around the world who might encounter similar issues. (So warning, parts of this blog may be technical in nature and boring to non-techies.)

As many of you know, we launched Chess Mentor a week ago, Chess.com's first pay-for-use product. It was a long 2-3 month project that was very ambitious from the get-go. We had a very old legacy Visual Basic application with tons of lessons stored in an Access database, and the plan was to rewrite the whole application using php/ajax/javascript/html and a MySql backend. Just converting the data to mysql was a very long arduous task of script writing, review, testing, more script writing, data cleanup, schema changes, etc. Building the application was extremely difficult because it needed to behave like a seamless desktop application, without any page refreshes, but also be easy to use, and was very database-heavy. This called for LOTS of ajax and also LOTS of javascript to make things like the timer and rating bar and all that jazz work.

 

After we launched the project Thursday of last week (December 6, 2007), it has received nothing but praises and compliments from those that have tried it. I have shown the project to fellow developers and they have given me nothing but "wow" and "awesome!" as they've been very impressed with the technical feat that's been accomplished, something that non-developers can't really appreciate. This definitely is the kind of stuff that makes me excited, keeps me going, builds up my morale, even when you're not actually seeing a paycheck for the work you're putting in.

 

I was riding high on cloud #9 watching as our first subscriptions started rolling in, justifying that we had really created a high-value usable product that people were willing to pay money for.

 

Then.....it all started crashing down...a report here, a report there, all from Chess Mentor users reporting that from time-to-time their lessons would just freeze, or they'd get an ajax error, or IE would crash entirely and they'd have to reboot or restart ie to fix it. Nothing kills your morale faster than bug reports coming in about an intermittent not-so-easy to replicate bug, that only affects SOME users. I immediately told Erik this could potentially be one of those bugs that takes days or even weeks to figure out and fix (if fixing is possible.)

 

So on Monday began the laborious task of trying to recreate the bug myself. After a few lessons of Chess Mentor on our production boxes, sure enough, I clicked the "Try Again" button and the spinning ajax indicator popped up, but then......nothing...........just waiting and waiting forever. Just as our users had reported. Its a good and bad feeling, good in that I now know we have a problem, I have seen it with my own eyes, but a depressing "What am I going to do about it?" feeling at the same time.

 

Chess.com was built using a PHP Framework, so much of what goes on deep deep down under the Chess.com application logic, is somewhat of a blackbox. You can try at your own peril to open up this black box and peek inside and try to fix things, but you can quickly find yourself down a rabbit hole with few or no escapes. So, I shot off an email to the founder of the framework, to see if he had any ideas or where I could begin to add more logging to track down the problem. He gave me a few suggestions and off I went.

 

I added logging galore to the Chess Mentor product to track each and every button that was clicked, move that was made by every user using the product. The hope was that the next time this problem occurred, I could look at the log files and determine the problem, or I could reproduce it myself, and look at my own log.

 

After the logging was in place, I went off to try and reproduce the freezing, and after 10-20 minutes, I got it to freeze again. I checked out the log files, and there were absolutely 0 clues there. Everything on the server looked just fine, no errors in the php or weird behavior. Hmmmmmmmmmmmmmmmmm, now what?

 

Well, any time you use 3rd party software or a framework such as we do, one option is to make sure you upgrade to the latest version, especially if it seems the problems you are experiencing are in the framework. Bad news is that this is not an easy task, and sometimes when you go to the latest version of any software, you introduce NEW bugs, just what I didn't want to deal with right now, but, I was out of ideas, so I downloaded the latest version (we were about 11 versions behind), and started the super tedious task of upgrading.

 

The reason it is so difficult is because when you have an application the size of Chess.com, there comes a time here and there where you need to make changes to the core of the framework, bug fixes, or whatever. So upgrading becomes that much more difficult because now you have to merge your "hacks" with the latest files.

 

Upgrading to the latest version took me several hours, but wasn't quite as bad as I thought it would be. However, after the upgrade, I immediately noticed that the login box on the homepage no longer worked. UGHHHHH!!  Some weird JAVASCRIPT error. I have no problems tracking down and fixing php errors, but javascript errors...that's a whole nother beast. They can be such a pain.

 

I posted in the forums for the framework we use and asked for very experienced developers to help take a look at the problem. I was excited to get a couple responses from two of the big names in that community. I ended up working with "Kristof" from Belgium who has written the manual on the framework. Great guy and he started working on the problem right away. I hadn't even given him access to our servers yet and he was already trying to fix it on Chess.com using proxies and capturing network data. AMBITIOUS! I like that!

 

He also LOVED the Chess Mentor program, so he was more than happy to work on the bug AND learn Chess at the same time! We negotiated a deal where he could use Chess Mentor for free in exchange for helping me work on these bugs. Perfect! So, I got him setup with access and he started adding all kinds of alerts to the js ajax code so we could track down why it was crashing (on our test server of course). 

 

I told him that first I had tried upgrading our framework to the latest and was now seeing NEW errors on the homepage of chess.com (on the test server) and other form submission pages. He started digging into it and found that there was a conflict between the javascript in the framework and the javascript we use for our javascript chess boards (like the diagrams and daily puzzle). We use a js library called "Prototype" which is pretty intrusive.  

 

At this point, I was in bad spirits and disgruntled and as I told Erik, I wanted to just "crawl under a rock and die". Sometimes you wish you were in retail selling clothes or something, instead of trying to fix weird random computer bugs. I think it was about 5pm on a Tuesday, and I decided to just get away, so I went and curled up under my covers on my bed and was out in no time. Later that night I decided to get away from it all and went out and watched a great movie with my wife and Igor and his wife (yes the Igor from Chess.com). When I got home around 11:30, I was feeling a little reinvigorated, so I got back onto my computer, found Kristof was online, and we started working on the problem.

 

I decided to go ahead and start rolling back all the js files of the framework to see if the error went away. Yep! It sure did, and I pinpointed which file was causing the problem, and let Kristof know. A few hours later he had figured out how to resolve the javascript issue, I implemented it, and after a few more fixes by Kristof, we finally had a working version of the latest framework....now we can finally get back to fixing Chess Mentor!

 

With the alerts in place on Chess Mentor, it didn't take too long for him to realize that in IE, certain links on the Chess Mentor page were triggering the Window Unload function. This will definitely cause ajax errors because once the browser has unloaded the page, the DOM elements no longer exist so ajax will run into issues.

 

Kristof had seen this issue before and pointed me to a few links that discuss it:


http://softwaredevscott.spaces.live.com/blog/cns!1A9E939F7373F3B7!140.entry

 

Essentially, the problem is that whenever IE sees an href link with javascript in it, it assumes the user is leaving the page and therefore unloads the page. FF does not do this. I wrote an email about this to the founder of the framework, and he also knew of this problem and said that's why they had changed all href links to use "#" instead of "javascript:void(0)".

 

Ah hah! Now I remember, Erik and I had seen that after building Online Chess, all our ajax links on the site always made the browser scroll to the top! Very annoying behavior for the users to always have the browser jump to the top whenever an ajax action is performed. So, we had created a custom control for this that uses javascript:void(0) instead of the # (of course at the time we didn't know of this weird IE window unload behavior.)

 

So, things were looking good. We had figured out the problem (or so I thought) and I went ahead and changed all the links back to use #. I also found that by returning false, the window won't jump to top, so we get the best of both worlds here. I thought we had finally figured out the problem, so the last thing to do was some thorough testing on our test server before going live with the new code.

 

We brought in our #1 QA Ace to poke around on the test server...yes, the one and only ShadowC. He's an amazing QA guy, on top of being an amazing javascript programmer, which I was about to find out first hand. It didn't take him long before he found a lot of unrelated issues that were good finds and I got all those squared away. In the mean time, Kristof was still poking around on Chess Mentor and OOOOOOH NOOOOOOOOO, he sends me an IM, "chess mentor froze on test server"

 

All my hopes and dreams came crashing down. I had upgraded to latest framework version. We had found a problem in IE with window unloading and fixed it, and it was STILL freezing!!  Ahhhhhhh! 

 

Kristof had to go to bed at this point, so we decided we pick it up again tomorrow. In the meantime, I asked shadowc to take a look at it, and sure enough, it froze for him as well, and he said, "definitely a js issue". I asked him if wanted to take a look into it, and he said sure. The more eyes and brains on this problem, the better, because I was clean out of ideas.

 

I set him up with access to our test server and with the help of midnight commander, he started poking around the files. I pointed him in the direction of the js files he should be looking at, and he started doing some analysis on them. While he was looking around, I continued to play with Chess Mentor to see if I could notice any patters, and sure enough, I stumbled upon one! It was a huge breakthrough! If you can consistently reproduce a problem, that in my mind is at least 60% of the battle towards fixing a bug. I told shadowc and sure enough, he was able to reproduce the bug, so now we had a point at which we could attack the problem. Super excited, adrenaline flowing! Things were looking up again! What a roller-coaster of emotions!

 

He started analyzing the xml string in the ajax call to see if there was a problem there. He emailed it to me and suggested that it might be a weird "special character" in the string. Uh oh....this brought back memories. I've seen this before where weird characters, like the curly quote, can cause xml to crash (but usually only in IE). So....feeling optimistic we had found the problem (and a bit depressed I hadn't thought of this before), I went and removed the special char, and we tried to make it freeze again, and sure enough, no freeze! We had found the problem! Unfortunately all these special characters were mixed throughout dozens of columns/tables in the database and had come across in our export from Access to Mysql a long time ago.

 

I decided to shut down the website, export the database to a text file, then do some search and replacing on these special chars, then reimport the database. Many of you may have noticed the site was down for about 25 minutes during this process. After I finished that, I was SOOO confident we had finally fixed Chess Mentor for good!

 

About this time, this is now Friday, Kristof was backonline and the first thing he says to me when I wake up...."good morning.... i must be a pain for you... last thing you hear, first thing you hear. experienced lots of timeouts today on production"

 

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO! What a sinking feeling of absolute and utter dispair! Twice now I thought I had licked this problem and we're STILL having freezing problems. My wife can't stand to be around me any more I'm so angry at the world, at the internet, but above all, angry at Bill Gates, Microsoft, and his terrible IE browser!!

 

Kristof confirms that the freezing only happens on IE, and he says that he can reproduce it pretty well. He tells me to make a move, then click back, wait 3 seconds, click continue, wait 3 seconds, click back, wait 3 seconds, click continue..etc etc. Wow, talk about a mysterious process to reproduce a bug. Anyhow, after several tries, I gave up, it never froze for me.

 

I added even more logging to the server, and him reproduce some more, and I watched the log files, but still no clues as to what was going on. Super weird and frustrating. So what next....hmmmmmmmmmmmmmmmm.

 

Finally I decide we better try changing some apache settings to see what happens, and whose our #1 Apache ace? Igor of course. First we try turning off compression, no luck. Next we try changing keepalive from 3 seconds to 60 seconds (yes, I did say 3 seconds.)  Sure enough, that makes the problem pretty much go away, or at least to reproduce now, Kristof has to do his button clicking 60 seconds apart. If you want to learn more about what keepalive does and what it's used for, go to:

 

http://httpd.apache.org/docs/1.3/mod/core.html#keepalive

 

So essentially what was happening with the freezes was that if you send your ajax request in ie at the exact moment that the server kills your connection, the signals "get crossed" and the ajax request isn't really received by the server, cause that connection is now closed. It only causes an issue if both ends send their signals at the exact same time though and before the other side has received the new signal/status. For me, it was impossible to reproduce because I'm very close geographically to the servers, but Kristof in Belgium is very far. Amazing huh??!!

 

So sure enough, we turned off keep alive and the problems went away altogether! This isn't an issue for us because we use a separate asset server to serve all our images/js/css files, so we don't really need a keepalive.

 

Woooo hooooo!  Problem solved! No more chess mentor freezes!!!

*** A special thanks to all those involved in helping to fix this bug:

  • Kristof
  • ShadowC
  • Igor
  • All our Chess.com users who took time to report this bug in precise details

 

(In the process of writing this, however, 5 more bugs have been reported to me and I'm way behind. Gotta run!!!)

 

Comments


  • 5 years ago

    chessfanforlife

    For all the staff of Chess.com...thank you!....for this wonderful site.....now i play live chess here than Yahoo.....i encouraged many of my friends to join in on the fun...I think Chess Mentor is awesome!....
  • 5 years ago

    FM paolodm

    Wow! As a software developer myself, I can feel your frustrations! The site is pretty darn amazing, especially considering this is all pro-bono.

    P- 


  • 6 years ago

    Alejandro_Gutierrez

    thanks for clearing all of those bugs for us keep up the good work.Wink
  • 6 years ago

    Unbeliever

    Good job Jay, that's a pretty hectic week.

     

     

    I am glad that I use firefox. 


  • 6 years ago

    valgaston

    Ok son of mine.  All I know is I am sure sorry to bother you with mundane computer problems especially during times like this.  Absolutely amazing that all of this is even possible.  I just have one question.  Why were you using so much cleanser? 
  • 6 years ago

    shadowc

    Yes, Jay, you're right... I was referring to point 2, which is the actual issue that I addressed... I thought it was a javascript issue, but it finally was XML related... Let me add (for the programmers) that I did realize this and started to search for XML anomalies because in IE something started to look weird in the JS side. When Firefox tells you that an elementColletion has n number of childNodes and IE was giving me 0 childNodes for the same XML response, I knew this had to be a XML parser error. That's something to keep in mind for the future, because it will happen more and more often with IE as Web 2.0 evolves...
  • 6 years ago

    A-Jenery

    I think you are all code magicians. 
  • 6 years ago

    Scott

    Wow, that is an amazing story.  And it's funny that through all of this Jay is tutoring me in PHP and Sql.  Maybe you should add a donation button and we can all throw a few more bucks at chess.com.  It's a small price to pay to have chess readily available at our fingertips anytime we want to play people around the world.  GREAT JOB CHESS.COM CREW!!
  • 6 years ago

    kurtgodden

    Very interesting post, thanks for the time you took to write it.  I was struck by two things.  First, I work in the auto industry and our dealers are plagued by similar extremely-difficult-to-reproduce problems which end up costing the company tons of warranty money and the customers tons of frustration.  There's even a technical term for it in our industry -- an intermittent (as a noun, not an ajective). And of course most intermittents are caused by the high usage of computer contollers on vehicles, so the next time you have a frustrating vehicle experience, just map your chess.com scenario into ours, and you'll have a deeper understanding.

    The second thing that struck me was....I'm just SOOOO glad I live in a Macintosh world.  I don't use Microsoft Internet Explorer (at least at home), so I don't have to deal with their bugs.


  • 6 years ago

    scrat

    I didn't even read all of that, but it sounds like you had a long week.  Now is a good time to ask for extra pay or an extra week of vacation. Wink
  • 6 years ago

    bradyj

    AWSOME site, keep up the good work! I'll be ordering my annual subscribtion of CM after the holidays.
  • 6 years ago

    levvo

    Fabulous job! I feel like I owe you a one-year subscription to CM.... at least!

  • 6 years ago

    jodigardner

    wow Jay Jay.  I don't understand any of that computer jargon but I read the whole blog!  What a rollercoaster.  Thanks for helping me with my computer problem this week, amidst the chaos.  :-)  What a great brother.  Laughing
  • 6 years ago

    jay

    On the contrary shadowc....Chess Mentor was freezing for 3 reasons:

     

    1. Window Unload problem with href="javascript:void(0)" (this was a javascript problem
    2. Bad characters in the xml string
    3. Network related keepalive problem

  • 6 years ago

    shadowc

    Embarassed Well... it wasn't a Javascript error after all...Wink
  • 6 years ago

    SonofPearl

    That's a fascinating glimpse into the hard work that goes on behind the scenes.  Kudos to everyone involved for licking the problem.

     

    Some monkey, Jay!  Well done! Smile


  • 6 years ago

    KingLeopold

    Great work! Keep on Blogging
  • 6 years ago

    farbror

    Very Interesting! Roughly 205% of the Technical stuff were beyond me but the article clearly shows the spirit and devotion among the Chess.Com-team.

     

    You're Doing a Great Job!

     

    You wrote: "...Chess.com's first pay-for-use product."

     

    Any "Ho-hum" on what the next product will be?

  • 6 years ago

    slowhand

       What an incredible story!  After trying CM demo I had to have it right then.  Yes, the 1 year subscription.  The first couple of times my screen froze the timing of it actually kept me from failing the lesson in progress so not a big deal.  Unfortunately it didn't take too long before the opposite started happening, i.e., freezing just as I was finally about to successfully complete a lesson.  Can't you just hear me ripping and roaringYell BUT ... it was so much fun and I felt then and still do that it was an incredible learning tool so I thought to myself, "yeah it's new and just a matter of them getting the kinks worked out of it" Plus wasn't sure it wasn't something I was doing wrong (frequently the case).  Decided to try ff about midweek and my problems were overCool.  Then last night when I saw the message about the site going down I figured  MY MAIN MAN had it figured out.  I said this before Jay, never any doubt!  I hope that one day soon you will be rewarded / compensated for your dedication, determination, patience, thoughtfulness, intelligence, resourcefullness, etc... again, I have no doubt you will.  THANK YOU!!!!!                       I just want to say one more thing to all that may be read this.  As mentioned above, I truly belive this system will help one improve his (oops, or her) game.  My feeling is so strong because of the instant feedback provided by the pro's.  This feedback is provided after correct and incorrect attempts.  If one finds a particularly difficult solution to a lesson he is told so and then it is explained in detail why.  If during the attempts to solve the lesson one tries a "way off base" solution the feedback may be something as "Come on, you can do better than that" almost in disgust of the non thought out try.  The price we are paying for this is such a steal.  I've logged over 18hrs of lesson time thus far and the cheapest online chess coach I've run across charges $50 / hr.  And that is not all!  Studies have proven that feedback is the key + studies have also proven that 1 hr of classroom time / lectures is roughly = 4 hrs in the books to learn the material at hand.  No, it's not lecture time but very close and for pennies.  Soooooooooooooo, support chess.com and sign up today!  Jay deserves a check!
  • 6 years ago

    okalex

    Interesting post.  I'm going through the same thing with an AJAX-based interface on an embedded web server right now.  Feel fortunate that you have the storage space to put logging code wherever you want.  Keep up the excellent work (but don't forget to kiss your wife once in a while)!
Back to Top

Post your reply: