Friday, February 24, 2012

42 - Or do you understand your Big Data question?

What is the ultimate Big Data question?  Well it is of course the Biggest Question... the question... of Life, the Universe and Everything.... but that poses a problem: in an analytical world do you really understand the question?

(or in short form if you are in a rush)

The point here is that one of the major challenges with Big Data is that we are moving away from simple SQL driven questions 'who are my top ten customers' or 'how much did I sell last week' into much more analytical and predictive questions such as 'if I reduce the price of goats cheese how much more red wine will I sell'

This presents a new set of challenges because analytics can give you simple answers '15% more' which then lead you to drop the price of goats cheese.  The network effect of that change however means less beer is sold and less hard cheese is sold so now you are over stocked in beer and hard cheese, both of which have a use-by date.  The point is that the question was badly formed but correctly answered.  Greater degrees of abstraction also introduce greater degrees of assumption in those creating the models.  So while the business has asked a small and concise question 'where to put the next store' the model has made certain assumptions that may or may not be the case. How are these assumptions shown to the business and if they are can the business even understand them?

Today there exists a problem of chained spreadsheets, in the future the issue of chained analytical models is going to make the connection between the business 'question' and the 'answer' more complex and harder to understand and put more power into the hands of mathematicians who prove good at converting abstract questions into good models.  This also means that there will be ever more importance placed on getting control of the definition of information into that model (what is a customer, what is a product, how do you identify them... MDM stuff) these are the bits that the business can control.  The core information, the sources and the quality control around them.

Big Data answers - only as good as your understanding of the question.

