Tuesday, February 19, 2013

Big Data and smart maths aren't new, that is the GOOD THING about it

One of the things that annoys me sometimes, and its quite a long list, is when people proclaim something as 'new' when in fact its just a case that its gone mainstream.  The problem I have with this is that it normally means that they've forgotten all of the learnings of previous generations of implementation and are starting from scratch and making the same old mistakes.  We saw this with the rise of the internet which threw 20 years of research into human computer interaction and it took us about 10 years to get to the stage where people started thinking about design in a human rather than media company way (hat tip to Google who seemed to have read their technology history).

Big Data and Predictive Analytics aren't new either.  The UK's Met Office has been a Big Data & Predictive Analytics organisation since its foundation in 1854... 1854 folks.  Now sure you can say 'oh but they didn't have Hadoop' but that doesn't mean that in context it wasn't a preditive analytics organisation from its foundation and what its being doing for the last 30+ years has certainly been 'Big Data' and Predictive Analytics.  Simulations for Nuclear Explosions are pretty Big Data as well, and those clever chaps down at CERN and the LHC are building on decades of work in handling and processing massive data sets.  Remeber SETI@Home?  A project that had a massive data set and farmed out small chunks of it to individual servers for it to be processed then returned... almost Map Reduce like one would say.

What about in-memory data and real-time analytics?  Well have you ever flown on a plane?  Ever wondered just how the Air Traffic Controller sees your flight information, trajectory and warnings if it looks like you might crash into another aircraft?  Strangely this doesn't involve a series of database queries, its a stream of information with some very smart maths which identifies the planes, some other maths which works out their trajectories and then yet more maths which looks for potential collisions.  All of this is done in real real-time, not 'real-time' in a business sense where a second is ok, but if it takes 5 then so what you'll just moan a bit, real-time as in people might die if you don't do all of this in a fixed time.

The point that makes Big Data and Predictive Analytics a REAL trend and one that will have an impact is that we actually know this stuff works.  Its just that now its cheaper for other people to do rather than governments and large scale financial organisations.  We know that stochastic maths and other rather clever approaches can give decent predictions in chaotic systems, we know that you can adjust models in real-time, but only certain types of models and we know that there are limits to what things like Map Reduce can do from an analytics perspective that the in-memory is great but that memory is not infinite (unless you have a Turing Machine).

We do this commoditisation, for that it was it is, of technologies a disservice by claiming its 'new'.  Its power is that it is now a commodity, something available to all, that doesn't make it magic and that doesn't make it bullshit, what it makes it is REAL.

Big Data and Predictive Analytics will work because its built on solid foundations which are proven over decades of use.  It will fail if people don't recognise that and start from scratch because one thing that is true is that Predictive Analytics can be hard, infinitely more so if you don't learn from history. 

No comments: