96,000 Reasons to Mine Your Data

96,000 Reasons to Mine Your Data

I just read a story in the paper about the local high school. They received a $96,000 correction to their natural gas bill after 8 years of service, “due to billing error at Harpeth High School.”

Apparently, most gas customers in our area have meters that read in 100’s of cubic feet. Therefore the gas company bills per 100 cubic feet. But with a large building, like a school, they install bigger meters which read in 1000’s of cubic feet. On the billing side, they then multiply the meter reading by 10 to calculate usage. I’m sure you can see trouble coming.

According to Greater Dickson Gas Authority, they forgot to adjust the billing for the high school when it was added as a customer. They also suggest that if the high school had not been a new customer, the computer would have noticed the discrepancy and it could have been corrected. Darn that computer! Darn that new customer!

Having the computer scan for billing anomalies is a simple example of data mining. The typical function of their database is probably inputting gas usage and producing bills. But they must have a specialized report that says “find all customers with a monthly bill that is more than 50% different from their average bill.” The database wasn’t specifically designed to look for atypical invoices, but if you mine the data, the information is there.

So now we know they have the capability to mine their database, yet they still failed to notice $96,000 being pumped out the door. Suppose they have in their database the square footage of the customers building. Even a simple classification, like “2 story home” or “50 unit apartment building,” would be useful. Now they could run a report called “find the top 100 consuming buildings under 5000 sqft.” Or in the case of the high school, “find the customer with the lowest ratio of usage per sqft.” Divided by 100, the high school was probably using less than I do in my house. Surely that would stick out in a report like this:

Square Feet 100 Cubic Feet Ratio
2000 50 0.025
1500 30 0.02
100,000 35 0.00035

And that’s not all! Take a similar report, but start looking at similar dwellings: “find homes between 2000 and 2500 sqft that use less than 50% of the average usage for homes between 2000 and 2500 sqft.” That could pick out the segment of customers that have a gas fired hot water heater, but an electric heat pump. The gas company could send them all a flyer offering 0% financing on a new gas furnace, or free installation for a ventless gas fireplace.”

Any company that has a database also has many opportunities for mining that data. In this case, it meant losing revenue. In other cases, it could mean more business. Either way, you need to utilize what you have in as many ways as you can dream up.