High Flying and Query Adjustments
Posted by Justin Scott at 11:19 AM
0 comments - Categories:
The weather in Florida has been less than perfect for the last several weeks. Between the clouds and rain and the astronomical expenses I incur while flying, I have not taken to the skies for a month. Yesterday's weather was just perfect though, and I couldn't resist dusting off my gear for a quick flight. I departed Sarasota just after 7:30pm and had a relatively uneventful hour at 3,500 feet before landing at Kissimmee Gateway in Orlando. There were several other aircraft coming in and out of Kissimmee, but that's not ususual. After landing I taxied back to the runway, took off, and came back to Sarasota. Overall it was a textbook flight, frankly, and not much to write about.
The reason I bring this up, though, is that web development is much the same way. We write applications, put them on the server, and most of the time everything just works. Like flying, there are times when something isn't right and you have to call your special skills and training into play. During flight training, the instructors make a big deal about the emergency procedures. What do you do if the engine stops working? What if a cloud layer forms up around you? What if the engine catches on fire? These are all things that do not happen often, and I may never experience, but we train for them anyway because the possibility is there.
Unfortunately, many web developers "train" as far as getting things working, but no further. Essentially, they learn how to take off and land with their applications, but would be caught unaware if the engine failed. There are many things that can go wrong with a web application as well, and developers should know what to do and how to handle them before the event occurs. What if the web server disappears (could be hardware failure, network issue, whatever, but your server is just gone)? What if the web site gets popular and you get tons of traffic all at once that your server can't handle? What do you do if the site is hit with a successful SQL injection attack? Oh, the possibilities!
One of the things I do is deploy an early-warning system. Using FusionReactor, an application that gets under the skin of ColdFusion to peek inside, we get an e-mail alert for any long-running request, and can use it to monitor which pages need attention. Lately one of the sites we manage has been getting slower as the traffic increases. This site is very heavy on database access. Each page has more than its share of queries, and some of them are very resource intensive. The site is also very large, pushing hundreds of thousands of dynamic pages.
Last year we had the same problem, and we made one change that has held us over until now. At that time, each page performed a radial zip code search to return other zip codes within a certain radius of a given zip code. This is a fairly resource intensive process on the SQL Server, and the results hardly ever change. My caching tooth was sore, and something had to be done. To combat this, a table was created to store the results of the query. It has two columns, one of the zip code you're checking, and another for the list of results. Just before we run the radial search query, we check the new caching table to see if we have a list already. If not, then we run the big query as usual, but then we take the results and stuff them into the caching query. At first, this actually makes the pages slightly slower because of the extra insert. However, over a few days all of the common zip codes had been requested and they were all cached. A one-time hit for much faster processing later on. This is similar to how ColdFusion compiles the CFM pages into bytecode the first time they are run. A small performance hit on the first run for faster results going forward.
So it has been nice over the last year or so, but the performance issues are creeping back in as the site gets larger and brings in more traffic. Last night I spent some time with FusionReactor watching the request history and picked out some of the ones that I felt could do better. Not all of these were sounding the alarm on the monitoring, but I wanted a good sample to work with. For the pages I was looking at, the database queries were once again the cause of the delay. At first glance you might think we need a better database engine. Unfortunately, the database server itself is not the cause, but rather the way some of the queries were written. They worked great when we didn't have tons of traffic, but now they're holding up the show and have to be updated.
In particular, I identified two place where we had used a subquery to filter the results of another query. Unfortunately, the SQL server was treating it like a join, so internally it had to take all the rows (over 100,000 records in one, and 30,000 in the other) and combine them all up together, then apply the various filters. Due to the nature of the data and the size of the tables, this process was taking between 250ms and 300ms to execute. This may not seem like a lot, but when the server is busy it really shows. On some queries, if there were a lot of matches in the subquery, it would take upwards of 1300ms to run. Unacceptable!
To combat this, I just broke the subquery out into its own query, then passed the results through to the original query using CFQUERYPARAM with the list="yes" attribute. The subquery, run by itself, is very fast, and the main query, run by itself, is also pretty quick. Running them together with an implied join in the database made them crawl. Now that they're both in their own, the combined execution time is averaging less than 20ms, a vast improvement. As a result the server is humming along against, pages are snappy, and it's able to handle more requests per second than it was before. I can only hope it will be another year before we have to find something else to optimize that we don't see an an issue today.
So, to bring this all together, web programming is a lot like flying. Plan ahead, monitor your guages, and train for emergencies before they happen so you can respond with confidence. Address problems early, on the ground, before they become matters of life and death.