Category: ColdFusion

Nov 13 2008

The Gold-Rush Period for E-Commerce is Over

When I first started in professional web development back in 1999 (that's equivalent to the Mesozoic Period in Internet time), it was really easy to put together an e-Commerce application and start taking orders and raking in cash over the Internet. People were throwing up stores for everything from books to pet food. Some met with wild success, and others went down in flames. When I say "easy" I'm not just talking about business ideas, but also about the technology and regulatory compliance.

You see, back in the early 2000's, you could create a simple shopping cart, make up a checkout process, then take the customer’s credit card information and pass it along to CyberCash or Authorize.net (or any one of a dozen other processors) and they would return either an authorization code that told you money was going to show up, or an error which you could pass back to the user so they could correct their error and try again.

Unfortunately, many programmers were very, very sloppy. In fact, some were just negligent in the way they handled the customer information. For example, storing all of the order information with full billing details in a text file that anyone could request right through the web site (it might have been easy for the owner to get to, but easy for everyone else to get to also). Things like this resulted in the wholesale theft of credit card and bank account details, which in turn led to millions upon millions of dollars of fraud.

Over the past few years, the "Payment Card Industry" has taken matters into their own hands and forced everyone, through their various agreements and contracts, to fix the problem and secure their systems. Now, don’t get me wrong, this is great for consumers, and businesses should be happy that these changes are being implemented, but it is really painful as a whole. At this point, nearly everyone is required to be in compliance with the Payment Card Industry Data Security Standard (PCI-DSS or simply PCI Compliant for short).

The PCI-DSS contains 12 main categories to look at within any business that processes, stores, or transmits information that is covered by the standards. This means that the little web store that just takes the credit card number and passes it to, well, whoever, to process the transaction now has to be fully compliant with every letter of the full requirements, even though they may not actually store any of the information at all and just pass it along.

Many of the things covered by the requirements are simple, such as using encryption during transmission (through SSL between the customer to the web server), encrypting the data once it’s received (if you’re storing it in a database), not storing the card security code, and restricting access to the card data to those who really need to have access to it. Many web developers were already doing these things and took a "reasonable" approach to securing customer data. Unfortunately, even when “reasonable” measures are taken, bad things can happen and the information can end up in the wrong hands. Or worse, someone may get access and you don’t even know it and they steal information over a longer period of time without being detected.

The PCI-DSS covers all of these things and basically forces a business to take data security very seriously. First, there are requirements for how the data is brought into the system, what you can do with it and what you can't do with it. Second, who has access is covered. There are requirements for password complexity, password rotation, account auditing, etc. They cover network and infrastructure security to ensure nobody can just walk up to the server and plug a portable drive into the system and walk away with a copy. The standards talk about logging and auditing so that if someone does break in there is forensic evidence that can be looked at to determine how they got in, and most importantly, what was taken so that customers can be notified. The list goes on and on and gets into a lot of highly technical detail.

The way I see it, the days of "Mom's Apparel" throwing up a store and taking credit cards directly are simply over. Any business that wants to accept credit cards directly will have some serious thinking to do. The cost-of-entry just went up, a lot. Unless they're planning to do more than a certain volume of online sales every month, it’s just not worth the overhead to ensure compliance with the PCI-DSS.

Not all hope is lost, however. Notice that I said "accept credit cards directly." A new business can still take payments online, as long as they aren’t handling "sensitive" data directly. Providers like PayPal have entire departments dedicated to regulatory compliance, and they've been doing it for nearly a decade now. They know how to manage and secure their systems properly, and the fees they charge are about the same as you would pay to a traditional credit card processor. They become the ones who have to be compliant since you are no longer handling the information directly. They just let you know that a payment has been made but you don't have to worry about the actual credit card data.

Unfortunately, companies like PayPal have earned a somewhat negative reputation for some reason. Perhaps the merchant will think their customers won’t take them seriously or believe they aren't a "serious business" if they only use PayPal to process their transactions. Whatever the reason, many businesses are reluctant to wash their hands of regulatory compliance and just keep doing what they’ve been doing in the past. Sure, they can say they're compliant, but if there is ever a breach and data is stolen, the liability will come down upon them like a ton of bricks. Frankly, it has the potential to destroy the business entirely.

All in all, the easy times are over for online sales. Any small operation with dreams of selling online had better get used to the idea of using an outsourced payment service if they want to minimize their liability. No, it's not glamorous, but it's safe, and that is what customers need right now. They need to know that when they purchase from your online store, their information is secure. The best way for a small company to do that is to outsource that piece to the experts who know how to do it properly. It's better from a liability standpoint, and from a trust standpoint knowing, not hoping, the payment data is safe.

0 comments - Posted by Justin Scott at 3:06 PM - Categories: ColdFusion | Business Development

Sep 1 2008

Preventing SQL Injection Attacks

Over the last several weeks, ColdFusion web sites have been under attack all over the Internet.  In what is apparently the first large-scale automated SQL injection attack, what is assumed to be a botnet is attempting to use vulnerable sites to inject JavaScript code into the back-end database.  In this case, the attack is directed toward Microsoft SQL Server.  Once the JavaScript is injected, the hope is that the code from the database will be displayed somewhere on the web site, causing the visitor to download and execute the JavaScript code they had posted elsewhere on the Internet.

While this is all very unfortunate, it is entirely preventable.  If the web developer used proper secure coding techniques and considered the security implications of what they were doing, then of the many sites that were affected wouldn't be in the mess they're in now.  Fortunately this particular attack is easy to reverse within the database, and simple to prevent in the code.  Specifically, it exploits areas where the developer is assuming that the input from a URL variable will be safe to use and pass directly to the database query.  ColdFusion provides a tag called CFQUERYPARAM that tells the database to create and execute a "prepared statement" which negates the possibility of success in nearly all SQL injection attacks.  Unfortunately, many ColdFusion developers either don't know about this, or don't understand why it should be used.

On the CF-Talk mailing list there was a lot of discussion about this topic as the attacks began to infect more and more sites.  We all know that code can live on a site for years after it is written, and we can inherit code from other developers who don't know any better.  For them, it may take weeks, or even months, to go through their code and make the necessary updates.  In the mean time, their sites are sitting out there vulnerable to attack.  If you are in that position there are other preventative measures you can take:

  1. Install a filter on your web server that performs URL rewriting.  This can look for specific keywords often used in these attacks (DECLARE and EXEC specifically for the current automated attack) and then rewrite or redirect and URL containing those keywords.  This prevents the attack from making it to your ColdFusion code at all.  Helicon makes an ISAPI filter for IIS called ISAPI_Rewrite that can do this on Windows servers.

  2. Create a rule in your Intrusion Detection System that looks for and drops requests that contain the forbidden SQL keywords.  There are several custom Snort rules out there that will do this for you.

  3. If you're on a shared server without access to these resources, you can implement code filters as well.  I wrote one back in late 2005 for a site I took over from another developer just after it launched.  Unfortunately the original developer had pretty much no concept of security, and I wrote several filters for it to hold us over until we had time to go back and clean up the code.  One of these was dubbed SQLPrev and has since been released into the public domain.  It can easily be included into any ColdFusion application to provide high-level protection from most SQL injection attacks.  The "official" ColdFusion MX code, as of this writing, can be found at http://www.gravityfree.com/_sqlprev.cfm.txt, and the ColdFusion 5 code can be found at http://www.gravityfree.com/_sqlprev5.cfm.txt.  I also maintain copies (MX, 5) here on my web site if the links above ever stop working.

I can't stress enough that these methods are not "full scale security" but rather a temporary perimeter filters to be used until your actual code is secure.  Someone once likened them to sentinels keeping watch at the city gates.  Just because the sentinels are there does not mean you shouldn't also lock the gates.

0 comments - Posted by Justin Scott at 11:24 PM - Categories: ColdFusion

Aug 31 2008

Build a Better MLS IDX

Due to recent changes made by the local real estate boards, I have had the dubious pleasure of re-writing the MLS listing data import and processing routines that we use to feed data to the real estate web sites that are managed by my employer.  First, I'll say that I believe it is a miracle that the MLS system is functional at all.  Second, if you are ever asked to integrate MLS listing data into a real estate web site, double your budget estimate, and then double it again.  You'll thank me in the end.

Unfortunately, this is the third time I have had to integrate essentially the same listing data for our clients.  The local real estate board has changed vendors as many times in as many years.  Each MLS vendor has their own preferred data formats and distribution methods.

The first vendor provided FTP access to a secure server where we downloaded pipe-delimited text files.  Unfortunately their documentation consisted of the column headers within the data file.  We were left to pour over nearly 250 columns with cryptic names that in some cases defied logic.  We managed to figure out which ones we needed for the local database and weed out the rest upon import.  For the images, they provided a Windows application that would read their data file, then go fetch the images for the listings in that file.  We wanted an automated process, so we ended up using an HTTP proxy to watch the traffic it generated and reverse-engineered the file naming schema on their image server.  The only good thing about this particular MLS was that we could predict the image names and generate the photo URLs during the import for use later in display.  This prevented us from having to keep a local copy of all the images.  It was clunky, but it worked great for about a year.

Round two came just before Christmas in 2006.  The new MLS vendor was hooked up with MLXchange to provide their IDX service.  This meant a completely different system with a new data format to work with.  I should also mention that they gave us almost no notice before the switch, just a couple of days.  I will say that MLXchange was a step up from the previous provider in some areas, but still far from ideal.

To start with, their export interface only provides options for full or incremental, and data only, images only, or both.  The problem with this is that with an incremental data feed, you never know which listings are deleted, and trying to keep the core listing data clean and accurate is essentially impossible.  The only way to ensure clean data is to clear out the local database and re-import all listings each day.  Unfortunately, pulling the full set of images just isn't practical given their vast numbers.  60,000 listings times an average of five photos, multiplied by an average photo size of 20KB works out to about 6GB of images.  Frankly, we don't have time to process that every day.

To make it work, we employed two IDX accounts.  One was configured to download the full listing data, and the other set to bring down an incremental image set.  We did have to bring down an initial full image set to get things started, and it took a couple of days to get everything in place.  As previously mentioned, with an incremental feed there is no way to know when something is removed, so over time the image store just keeps growing, and when images are deleted from MLS we have no way of knowing and they continue to show up on the web site.  Unfortunately the online versions of the images at MLXchange are distributed throughout a seemingly random folder structure that make predicting image location there impossible, making this daily dance with the images necessary.

Another annoyance with MLXchange is their delivery method.  Previously we could pull the data file and images directly over FTP and HTTP, respectively.  With MLXchange, the delivery is all FTP, though their FTP server offers no external security.  Anyone can connect and download whatever they want.  To protect the data, they bundle the listings file and images into password protected ZIP files.  Unfortunately, ColdFusion did not have native ZIP file processing at the time, so we had to purchase a third-party utility that could handle protected ZIP files to get at the data once it had downloaded.  To make matters worse, occasionally one of their ZIP files would be corrupted and therefore unable to be opened.  This wasn't a problem with the listing data as we could just wait until the next day, but for incremental images we got one chance to get them.  Once they were exported we would not see them again until an agent uploaded a replacement photo.  Their system also had a nasty habit of running the daily incremental export twice in a row.  The first run would bring out the 1,000 or so new images for the day, and then it would run again a few minutes later, so we would end up with a ZIP file that included maybe three images, and the other 1,000 new and updated photos would be lost.  That's not easy to explain to a broker when their agent is wondering why the latest and greatest photos are not displayed on the web site.

The actual listing data in MLXchange was more organized, and includes schema files with the data.  We still have to figure out what some of the fields are, but it was relatively simple to create a parser for the schema file that would in turn generate a SQL script that creates the necessary tables in our SQL Server.  We then created a parser to bring in the listing data.  An image indexing script would crawl through the image folders and keep the database updated with the locations of the images for each listing.  Due to the way the images are named, displaying images on a listing also requires additional effort.  When looking at the database for the images we just tell it to sort them by the filename.  Unfortunately, this puts the "front exterior" photo, the main image for the listing, second or third in line.  To get around this we have to do some coding magic to re-sort the images using a custom algorithm to get them in the right positions.  This is way more work than any person should have to do just to get the images in the right order.

The good news is that once the data has been downloaded, parsed, and processed into the local SQL Server, we can do all sorts of nice things with it without a lot of additional effort.  We can create search interfaces and display the listings to the public.  With the images being local on the server we don't have to rely on the MLS vendor's servers being online to see photos (the first vendor had a couple of outages that prompted calls from our clients wondering where their images went).

Despite a couple of minor issues here and there, this was all very happy until July 2008.  The local real estate board once again changed MLS vendors.  This time they gave us a month of lead time about the change.  Unfortunately, they are just as busy as we are and we did not start getting details about the change in data for a couple of weeks.

The new data was to be provided through a RETS server, and they suggested an application that would handle all of the downloading of the data and images.  At first glance, it looked very promising.  The Windows GUI application would allow us to define searches and then download the listings and images to the server on a schedule.  Great!  It didn't take long before we started having problems.  First, RETS operates on metadata and can download the data into an XML file, or any one of several formats you request.  It takes a while to download and process the metadata for every action in the program which makes things take far longer than they should.

Second, you can select which fields you want in the export.  Our first task was to assemble a list of all of the fields we needed, and to create a search and download for each property type.  The first issue arose when the system repeatedly gave us "field access denied [xxxx]" errors with a numerical code.  To find out which field was denied, we would have to use the metadata browser to crawl through the records looking for that ID code.  For each instance, we would have to contact IT support at the local real estate board and ask them to open up access for those fields.  Many phone calls later, we had that part working.

Next, we opened up the searches to download everything, and another problem reared its ugly head.  The RETS server had a limit of 5,000 results per search.  This affected the listings and the images.  With tens of thousands of listings, we would have had to create dozens of intricate searches to cover everything, and even then there was no guarantee that a particular search would not ever have more than 5,000 results.  Each search would have to be scheduled to avoid overlap, and we would then have to process dozens of different files that would contain the data.  There had to be a better way.

After more calls to IT support, they finally told us that the new MLS vendor also works with our old friends MLXchange to get access to IDX data.  I almost cried.  Here we had just spend nearly 20 hours in development time trying to integrate a new system that we know little to nothing about when the vendor we're already connecting with and have code written for is an available option.  Later that day I was configuring my full daily data feed just as we had used before.

The following day I nearly cried again when I realized that the data format was the same, but the fields and schema were not.  This meant another run through the schema parser to generate the database tables (and the hours of tweaking it to normalize the things in this schema that the other one didn't have), more hours to update the actual data parser and import routine, and yet more time to schedule the full image dump and process nearly 200,000 images into the system.  All told, the latest update took nearly a month to implement.  Most of that time was spent waiting for support from outside vendors.  We rely on the local IT support at the real estate board for nearly any change that is needed, then another day for the change to be reflected in the data files.  To start with, basic information such as the listing broker office name, which is required to be displayed with the listing information, was not included.  To make matters worse, there are five different data tables, and each has their own set of information that has to be reviewed and updated.  Where a field is present on one it was missing on another, and so on.  It took nearly a week to get the data files to include everything we needed.  There are actually more changes we will be asking for now that it's up and running, and we will continue to have the problems with the images as we have had in the past (we don't know when images are deleted from a listing, and some can go missing if their export ZIP file is corrupted).

So, my point is that the real estate industry can do better, and must do better, to remedy these issues.  I would call on the National Association of Realtors to implement the following:

  1. Develop and impose a standard data schema for IDX.  Gather up all of the possible fields that cover the vast majority of listings and develop a standard list of fields.  They should use a standard naming convention that is compatible with all major database systems.  Each field should be well documented as to what data it can contain and how that data can be used subject to additional limitations by local boards (remember, we’re developers, not brokers, so we need some guidance on what your language means).  There should only be one field for each piece of data.  There is no reason we should have 12 different “public remarks” fields that have to be reassembled in the proper order.  A reasonable number of "extra" fields should be included for localized use, and some "reserved" fields should be included for future growth without having to change anything.  The RETS protocol does a good job of standardizing the format (XML), but uses metadata to define the fields and their contents.  Frankly, from a development standpoint, all this does is create more work for already overworked developers to sort through the mess to get at the data.  Once the schema is defined, it should be standardized over all of the IDX vendors so that they essentially become interchangeable from a programming standpoint.  They will have to find other ways to provide value.

  2. Insist that IDX vendors create a central image repository accessible over HTTP.  The images for real estate listings are vast in number and change often.  Why should each broker have to download tens or hundreds of thousands of image files to their web servers for local hosting?  Of course I believe that they should be available for download if the broker chooses as they are now, but for those that only need them for their web site, there is no reason the IDX vendor cannot put them online and publish details about how to predict and format the image URLs for each listing.  This would solve the "deleted image" problem that incremental updates create.  The biggest argument against this that I can see would be that it would use significant bandwidth to serve the images.  I would argue that it should actually result in less bandwidth overall.  Presently, many brokers are downloading full image sets on a weekly basis, or incremental image updates every day.  These include thousands of images that may or may not ever be requested by a web site user.  The broker has to download ALL of them, just in case they are needed.  If they were hosted locally, only images for listings people actually want to see are served up.  Through proper caching and HTTP compression, the overall bandwidth can be reduced or stay the same since brokers would no longer need such large exports all the time.

  3. Standardize the delivery format and method.  Personally, I prefer pipe-delimited text files available directly over FTP with a username and password on the FTP account itself.  What good is a password protected ZIP file when anyone can download it and then run a cracking utility against it offline, where nobody can monitor the attempts.  Eventually an attacker would find the password and then download whatever they wanted from a folder without detection.  Trying to attack a password on the FTP server itself is far more difficult and can be monitored, detected, and blocked.

  4. Require listing addresses to be normalized upon input.  Whenever a listing is entered into the MLS it should be checked against the postal service database (or some other source) and normalized according to a standard.  Zip codes would be required to be valid and match a known city name before being accepted, etc.  All of the rules for IDX say that the data cannot be modified before it is displayed, but when the IDX feed includes the city names as entered by the agents, it borders on negligence not to correct and normalize them before display.  With the current IDX data I am working with there are no less than eight different ways that "St. Petersburg" is indicated as a city.  You have "St. Pete", "Saint Pete", "St Petersburg", and on and on.  For every misspelling and variation on every city we have to pick which one is "right" and then create an exception to normalize every other option so the search functions will work properly.  If this were done at the source, where the data is entered, it would normalize the data everywhere at once and ensure more accurate listing information.

Given these changes, real estate listing data could be more easily distributed to brokers for use on their custom web sites, and it would save countless development hours and frustration in dealing with dozens of schemas, formats, and delivery methods.  Developers could write software to handle the nuts and bolts of downloading and processing the data and make it easy to implement no matter which MLS or IDX vendor is being used.

Given the data hoarding history of real estate agents and brokers, and the glacial pace at which the industry seems to move, I do not expect that these suggestions will be implemented any time soon.  In the mean time, I will continue to monitor the daily feeds, explain to brokers why some of their images are missing, and keep my fingers crossed that there will not be another MLS vendor change any time in the near future.

For those that are using ColdFusion to import IDX data from MLXchange, please feel free to ask any questions you might have in the comments.

2 comments - Posted by Justin Scott at 3:59 AM - Categories: ColdFusion