Tuesday, October 20, 2009

Data Accuracy isn't always important

Now while Amex really should have understood the term minimum there are examples where it really isn't an issue if someone gets it wrong in displaying the information to you. Sometimes this indicates that a prediction has been incorrect or that "approximately" is good enough for this scenario.
Is a good example of this, the current temperature on the Sydney Morning Herald site is listed as one degree higher than the maximum for the day. Does this matter? Well no and for two reasons. Firstly a weather forecast is accepted as being approximate information, its a chaotic system and so by definition can't be predicted exactly. Secondly the Max number is only a prediction and the current temperature is indicating that it was an incorrect prediction. So by having an incorrect piece of information we actually have more information as it re-enforces the concept that weather forecasts cannot be 100% accurate.
Now when the next day rolls around then looking back you should clearly be recording the actual maximum achieved rather than the prediction. This is because the information has gone from being a record of a prediction into a record of fact. The only question therefore is at what point you should update the maximum. Do you change it dynamically or on a daily basis when reporting historical information. For the Sydney Morning Herald site the answer is simple changing the daily maximum as it increases during the day would defeat the purpose of the "max" level which is what the paper predicted it would be at the start of the day. Its a free news story if it goes well beyond the prediction "Sydney Weather was bonza today with max temperatures 5 degrees higher than expected".

So the point is that when you look at data do think about what level of accuracy is important. If reporting a bank balance then spot on is the only option, if reporting the number of customers who bought cheese with wine as a percentage of overall cheese buyers then you can probably get away with 1 decimal place or less. This sort of view applies even more when looking at forecasting and other predictive data sets, the effort of increasing accuracy by 1% might be pointless due to the extra time it takes.

Data Accuracy isn't an absolute, be clear about what matters to you.

Technorati Tags: ,

No comments: