Programmer's Log

Wednesday, August 30, 2006

Last weekend, our ASP Member were already updated new versions of odds display. The new version was implemented similar to what I did with the Forecast of Agent system. The javascript behind just updated new information from the database. The purpose of doing this is to reduce the bandwidth usage. Bandwidth issues are becoming a headache issue for us as our business keeps increasing. Our bandwidth usage is reaching our capacity day by day.

The display odds of ASP version is changed, our .NET member site display odds will have to be changed as well. New odds display engine must be reimplemented. This will be another challenge since I won't have any assistance from the database. The process of analyzing the odds data will be in the web servers. Everything must be planned out and designed thoroughly. I'm taking care of this functionality. Hopefully, tomorrow new ideas will come to me.

Today our big boss came up to us and asked us to find a way to detect bot accounts which are programs. Those programs log into our website automatically and grab our prices from our sites, then update into their sites sothat they don't need to control the odds. It is extremely hard to check for these programs. We have user login logs. We could start from there to check the login patterns of all the accounts and spot those with irregular patterns. Bots usually log in and log out all the time. We could start from there. However, some smarter programmers will randomize their login patterns or mimic the browser behaviours; to trace down those accounts will be really difficult. We will give more thought on this.

Sunday, August 27, 2006

Tonight we had encountered the database problem again. Main database CPU went up high again. This time, my supervisor intervened timely to kill off the physical network connection and prevented the system jammed. Everything happened transparently to the users before they even noticed. Closely monitoring the system performance at peak hours shows its true importance.

We narrowed down the problem that might cause the system to be jammed up. Our supervisor checked the user actions logs from database. At the time that windows performance monitor showed the signal of the jamming to be about to occur, the operators were doing settlement ( a process of calculating the win/lose of each tickets after a game is done ). We will monitor this problem again on Sunday night.

Around 3AM in the morning, another problem arised. This time, it was the network. Agent website, and a .NET member site could not be accessed. Ping plotter showed red color at the our ends. SingTel line was having problem. It wasn't the problem from the ISP, it was our internal server issues. There is a problem at our firewall.

In the ping plot we could see the route which a information packet is going around on the internet. It is like the tracert command of window XP. We could see all hops visually. By tracking the ping ploter, we could know where the problem of the network to occur; international gateway or in our DNS.

Friday, August 25, 2006

This weekend, La Liga will start the 2006 - 2007 season joining England Premier League and Bundesliga. There major leagues have started; Series A will be the last one. A brand new soccer seasons is waiting for us. It is almost a year since we started with IBC - a sport betting system. We have owned more than 40% of the system which was developed by a Malaysian couple years back.

The system has grown quickly since last year from one database server, and around 20 web servers to 6 database servers and more than 50 web servers. The number of concurrent users has rised from approximately 10,000 to 37,000 users. Numbers of accounts has rised from 100,000 to 300,000 accounts. Within one year, the company business has been increased more than double.

We call our business as a type of gaming. This gaming industry is getting bigger and bigger everyday with online casino, online community games, lottery and sport betting. We mainly operate gaming with sport events. You guys could call this kind of business Gambling; doesn't matter to us, we still consider it as gaming with money.

Those are just a little brief introduction about our company and a little bit background of our software platform.


The 60% of our system is still in ASP. We are going to convert into .NET as well. Our architecture is already ready. My colleague and I have been designed it to contain lots of spaces for improvement later on. We must make everything planned out first as our system is constantly changing with the grow of the business. In our design, N-tier model is still in use, but we planned a way to quickly change from N-tier to services-oriented model. I will post a design eventually; and the design won't be in details, it will just be a rough idea design.

In the past 2 day, new functionalities have been implemented to our system

  • Mini betlist for member sites

  • Improvements to the Odds Controller of the External Admin

Next week, we will have a session on implementing Extreme Programming into our team. We start to follow Agile Development Methodology. Our work finally get paid off; we have convinced the management to apply Agile Development. My colleage and I will have to complete the process template as a proposal plan to them.

Another task that must put in to-do list is to back up our source code database. I want to back up the whole database frequently, but lots of suggestion were to just keep the release source not to keep the whole database. To me, keeping the whole development database will give us the benefits of getting all the metric of team development. We could figure out the team behaviours, real ownership of source code. These things are really important for team development improvement purposes.

Second weekends of the season, my colleage had applied new indecies into our database. How good it is? We will have the answer Saturday and Sunday night.

... No more weekends for me ... :'(

Thursday, August 24, 2006

Another day passed by. We are closing on our schedule. All the software have been prepared and waiting for approval to be in production; however, there is no signal from our trading department yet.

Testing more and more will give us benefit on the coming launching day and will reduce the chances of bugs which are always nemesis to programmers and businesses.

We did lots of testing today, and we found more bugs on our position taking functionalities. Instead of giving the tasks to my programmers, I went to fix it.

Our External Admin has been run for the second day now. No major problem were reported. Except the admin 2 people kept complaining that the program was slow. 1AM this morning, I checked with them on live games. The program was slow and half an hour later everything was normal. The problem must come from the network; maybe we have experienced some lagness with the internet connection into KBT.

Tomorrow, first thing to check is the ping plot of KBT network from 10 PM to 3AM to see whether we have some disruption with network. Hopefully the problem will appear on at the international gateway, or KBT gateway; or else we must ping each single Admin 2 people to see where the lagness occurs. We could recommend them to change the ISP or we have to find a better route for them to go into our server in KBT. The problem belongs to the network engineer. Maybe I jump a bit far now.

My to-do list gets longer and longer everyday. *sigh*

Wednesday, August 23, 2006

Yesterday I got an email from my supervisor about patching our Member Website. I totally didn't understand what he meant. After arriving at work, I asked him on what was going on; then series of mixparlay bet were shown to me and my colleague.

"Our System has holes"

That was the conclusion that we arrived. Some smart a**es figured out the way, we sent information to our server to process mixparlay bets and exploited that.

A mix parlay bet is a combination of choices over many matches. In each match, a punter is only allowed to choose one product (one of handicap, over/under, 1x2, and total goal); at the moment we only offer that many products for mixparlay (Parlay betting explanation). For our company, a parlay must consist of 3 soccer matches or over.

In our ASP version, mix parlay information will be passed in the query string of an url; then send back to our servers for processing the bets (something thing like http://linkA?a=betinfoA_betinfoB_betinfoC).

In our ASP.NET version, mix parlay information is posted through a form then send back to the servers.

For ASP version, it is pretty easy just to change the query string link. For ASP.NET version, the punter could interfere with the posting action by creating a dummy post form. If exploiting this, the punter will pass through all our checking on the interface and creating sure win parlay combinations. For example, they could choose 3 same choices for a parlay, and the next one choose 3 choices of the opposite; dispite the results of the matches, the punter will win at least 1 combinations; and that win is enough to cover all the expenses and with profit.

... when it comes to money ... people are getting so smart at digging and finding holes ...

We immediately patched our systems for this vulnerability both in ASP and ASP.NET version. We implemented 2 more levels of checking the duplicate match in processing bet level and in database level, and of course, update immediately into production servers.

Testing servers were set up for our CSD to test our new position taking functionality on our agent system as well. 2 web servers were deployed with newest code from the development, and database server with newest data from production servers. I wonder how CSD were doing at the moment. Once the test is done, 1h downtime of the system will follow in order for us to run an update on the database and on the interface code.

External Admin was also updated; new functionalities launched. I'm checking how it runs at the moment. Better than the old version, there is no complain yet so far. From feedback of our operations, external people liked the new version. I cross my finger and wait for things to land on our to-do lists; when the customers like it, they will definitely throw at us a series of IMPROVEMENT requests ... "you can do that, now please do this for us" blah blah blah ... and I got 3 new requests already on my table for tomorrow.

On our schedule, I have to continue launching 2 more member websites in the set of 10 sites that we are operating now. This will be to be determined tomorrow.

Monday, August 21, 2006

The first weekend of a new soccer season had passed ... All servers ran smoothly yesterday ... There were 2 big games yesterday Chelsea -vs- Manchester City and Manchester United -vs- Fulham.

Agent server reached 320 requests/sec yesterday with at peak 2,010 members. On top of these members, External Admin and Internal Agent were also run on this server. A dual Xeon 3.2GHz has been served us well.

With 8 applications is running on a single server using default setting of IIS6 will have an effect on the performance and response time. In Kaohsiung, we have 2 dedicated servers for Agent System ... now back in Taipei, 1 single server handles all and on top of it External Admin, Internal Agent are now also deployed on that agent server. I guess my supervisor created challenge for me to improve the performance instead just issuing new servers for me. And his decision turned out to be good.

After we moved back from Kaohsiung, I quickly noticed the executing time of a request increased dramatically from a few hundress milliseconds to 3-4 seconds on average. I discovered that running 6 or more web applications on a server with one IIS worker process only caused the executing time of a request to be longer. I must increase the number worker process for IIS but in which way?

Our applications must write the their own logs into harddrive in order for us to trace errors and trace customer activities. Increasing number of worker process will increase the chances of I/O queues to occur since 2 or more threads will fight each other to grab the privileges to write the logs. When this happens, requests into the system will be failed. This problem will definitely occur, unavoidable when using 2 or more worker processes if we still want to keep the application logging. Logging has been trememdously helpful for us. I don't want to remove it.

Compensating solution ... Yes, have one. I found out just using 2 worker processes; each worker process will handle specific applications. Why 2, not 4, or 8 ? tried ... After a series of trial and errors, with our 2 CPUs server, 2 worker process works the best; increasing to 4 worker processes, number of request failed increased much more than the gain of executing time of a request.

This method works well for us in the past weekend. No major problem or any issues. Executing time of a request is being kept below 800 milliseconds on average. Number of requests failed is below 4000 requests/per busy night like this weekend. The statistical performance numbers are really convincing to me that this method works well.

I could claim that our 2 dual Xeon 3.2Ghz server could support 2,500 users - 3000 users, but still need to try in practice.

I improved External Admin Odds Controller. Lots of new functionalities were put in.
- Hide odds, Open Odds were implemented.
- Implementation of odds selections for the users to choose

I think recently I overuse Web 2.0 technology. Really need documentation on security issues of this technology

Time to go to back sleep now ...

Sunday, August 20, 2006

6AM, checking the system log for the last time before leaving the office ...

Tonight we resumed our night shift to check on the system and to be stand by. The number of users was around 34,000 at peak time (10PM - 1 AM, Malaysia Time). System was running stable.

My colleague discovered bugs in Member Program which I wrote mainly. The number of online users was counted. Problems discovered was that I made a typo in my login page. Correct ServerID wasn't passed into my on-user count functions. I had been passing the dummy variable which was from the testing phases.

Our messaging system contained bugs as well; but no one noticed and we did passed users acceptance tests proudly. My collegue took care of it, mainly patches to the database.

Problems fixed immediately, but I couldn't update to production servers ... There were many live games. Decision was made swiftly and update'd be postponed until the next morning when there were the least live games.

The match between Arsenal -vs- Aston Villa was on the way.

DataBase CPU rised to 90%, our agent server CPU hit 100% and stayed there for a while . Operations experienced slowness and they couldn't change prices for the match, out there more than 34,000 users were effected ...

Unfortunately, we could not do much. From Profiler, live forecasting of the match Arsernal and Aston Villa ran excessively from main DB which is for just writing; forecasting of a game is to predicting the company win/loss of that game. Forecasting store procedures are always killers to the database, each SPs run at least 1 second on a 4 dual core 3.6Mhz Intel servers 16 Gigs of RAM with RAMSAN. We need to work on this if we want to reach 50,000 concurrent members by the end of this year.

10 more minutes, DB CPU dropped. My supervisor must have done something. We guessed he killed off the connection from web server to CPU; however, number of online users never dropped. It was not likely that he had done that.

For the next 2 hours, everything seemed to normal again.

While monitoring our servers, I reviewed and did couple testing for double handicap functionality for our External Admin again; made the javascript behind it to run smarter; the auto refresh would stop automatically after 1 minute if the script detects that the operations already closed all the available odds.

The launch of .NET member site for our IBC system considered successfully, not many major problems have occured for today - England Premier League started ...

The database performance still remains a big concern. We will have to monitor more closely in the next coming days.

Today, I have started reading One Jump Ahead again - a great book about a life of a real programmer who was my respectful professor at U ... The idea of start blogging my daily technical activities popped out in my head, it will definitely be good for me to start to review this after a while ...