Programmer's Log

Sunday, August 20, 2006


6AM, checking the system log for the last time before leaving the office ...

Tonight we resumed our night shift to check on the system and to be stand by. The number of users was around 34,000 at peak time (10PM - 1 AM, Malaysia Time). System was running stable.

My colleague discovered bugs in Member Program which I wrote mainly. The number of online users was counted. Problems discovered was that I made a typo in my login page. Correct ServerID wasn't passed into my on-user count functions. I had been passing the dummy variable which was from the testing phases.

Our messaging system contained bugs as well; but no one noticed and we did passed users acceptance tests proudly. My collegue took care of it, mainly patches to the database.

Problems fixed immediately, but I couldn't update to production servers ... There were many live games. Decision was made swiftly and update'd be postponed until the next morning when there were the least live games.

The match between Arsenal -vs- Aston Villa was on the way.

DataBase CPU rised to 90%, our agent server CPU hit 100% and stayed there for a while . Operations experienced slowness and they couldn't change prices for the match, out there more than 34,000 users were effected ...

Unfortunately, we could not do much. From Profiler, live forecasting of the match Arsernal and Aston Villa ran excessively from main DB which is for just writing; forecasting of a game is to predicting the company win/loss of that game. Forecasting store procedures are always killers to the database, each SPs run at least 1 second on a 4 dual core 3.6Mhz Intel servers 16 Gigs of RAM with RAMSAN. We need to work on this if we want to reach 50,000 concurrent members by the end of this year.

10 more minutes, DB CPU dropped. My supervisor must have done something. We guessed he killed off the connection from web server to CPU; however, number of online users never dropped. It was not likely that he had done that.

For the next 2 hours, everything seemed to normal again.

While monitoring our servers, I reviewed and did couple testing for double handicap functionality for our External Admin again; made the javascript behind it to run smarter; the auto refresh would stop automatically after 1 minute if the script detects that the operations already closed all the available odds.

The launch of .NET member site for our IBC system considered successfully, not many major problems have occured for today - England Premier League started ...

The database performance still remains a big concern. We will have to monitor more closely in the next coming days.

0 Comments:

Post a Comment

<< Home