Monday, January 06, 2014

Six reasons your Big Data Hadoop project will fail in 2014

Ok so Hadoop is the bomb, Hadoop is the schizzle, Hadoop is here to solve world hunger and all problems.  Now I've talked before about some of the challenges around Hadoop for enterprises but here are six reasons that Information Week is right when it says that Hadoop projects are going to fail more often than not.

1. Hadoop is a Java thing not a BI thing
The first is the most important challenge, I'm a Java guy, I'm a Java guy who thinks that Java has been driven off a cliff by its leadership in the last 8 years but its still one of the best platforms out there.  However the problems that Hadoop is trying to address are analytics problems, BI problems.  Put briefly BI guys don't like Java guys and Java guys don't like BI guys.  For Java guys Hadoop is yet more proof that they can do everything, but BI guys know that custom build isn't an efficient route to deliver all of those BI requirements.

On top of that the business folks know SQL, often they really know SQL, SQL is the official language of business and data.  So a straight 'No-SQL' approach is doomed to fail as you are speaking French to the British.  2014 will be the year when SQL on Hadoop becomes the norm but you are still going to need your Java and BI guys to get along, and you are going to have to recognise that SQL beats No-SQL.

2. You decide to roll-your own
Hadoop is open source, all you have to do is download it, install it and off you go right?  There are so many cases of people not doing that right that there is an actual page explaining why they won't accept those as bugs.  Hadoop is a bugger to install, it requires you to really understand how distributed computing works, and guess what?  You thought you did but it turns out you really didn't.  Distributed computing and multi-threaded computing are hard.

There are three companies you need to talk to Pivotal, Cloudera and Hortonworks and how easy can they make it? Well Pivotal have an easy Pivotal HD Hadoop Virtual Machine to get you started and even claim that that they can get you running a Hadoop cluster in 45 minutes.

3. You are building a technical proof of concept... why?
One reason that your efforts will fail is that you are doing a 'technical proof of concept' at the end of which you will amazingly find that something used in some of the biggest analytics challenges on planet earth at the likes of Yahoo fits your much, much smaller challenge.  Well done, you've spent money proving the obvious.

Now what?  How about solving an actual business problem?  Actually why didn't you start by solving an actual business problem as a way to see how it would work for what the business faces?  Technical proof of concepts are pointless, you need to demonstrate to the business how this new technology solve their problems in a better (cheaper, faster, etc) way.

4. You didn't understand what Hadoop was bad at
Hadoop isn't brilliant at everything analytical... shocking eh?  So that complex analytics you want to do which is effectively a complex 25 table join and then do the analytics... yeah that really isn't going to work too well.  Those bits where you said that you could do that key business use case faster and cheaper and then it took 2 days to run?

Hadoop is good a some things, but its not good at everything.  That is why folks are investing in SQL technologies on top of Hadoop, some of which like Pivotal's HAWQ or Cloudera's Impala, with Pivotal already showing how the bridge between traditional MPP and Hadoop is going to be made.

5. You didn't understand that its part of the puzzle
One of the big reasons that Hadoop pieces fail to really deliver is that they are isolated silos, they might even be doing some good analytics but people can't see that analytics where they care about it.  Sure you've put up some nice web-pages for people but they don't use that in their daily lives.  They want to see the information pushed into the Data Warehouse so they can see it in their reports, they want it pushed to the ERP so they can make better decisions... they might want it in many many places but you've left it in the one place that they don't care about it.

When looking at the future of your information landscape you need to remember that Hadoop and NoSQL are just a new tool, a good new tool and one that has a critical part to play but its just one new tool in your toolbox.

6. You didn't change
The biggest reason that your Hadoop project will fail however is that you've not changed some of your basic assumptions and looked how Hadoop enables you to do things differently.  So you are still doing ETL to transform into some idealised schema which is based on a point in time view of what is required.  You are doing that into a Hadoop cluster which couldn't care less about redundant or unused data and where the costs of that are significantly lower than doing another set of ETL development.

You've carried on thinking about grand enterprise solutions to which everyone will come and be beholden to your technical genius.

What you've not done is sit back and think 'the current way sucks for the business can I change that?' because if you had you'd have realised that using Hadoop as a Data substrate/lake layer makes more sense than ETL and you'd have realised that its actually local solutions that get used the most not corporate ones.

Your Hadoop project will fail because of you
The main reason Hadoop projects will fail is because you approach using a new technology with an old mindset, you'll try and build a traditional BI solution in a traditional BI way and you'll not understand that Java doesn't work like that, you'll not understand how Map Reduce is different to SQL and you'll plough on regardless and blame the technology.

Guess what though?  The technology works at massive scale, much, much bigger than anything you've ever deployed.  Its not the technology, its you.

So what to do?
.... I think I'll leave that for another post

6 comments:

Anonymous said...

Java has been driven off a cliff by its leadership in the last 8 years.
Yes, Everybody thought too before 4 years - Java was supposed to end as Cobol, but it changed. Have you seen Java 8? Everything big is made in Java now.

Hadoop is open source, all you have to do is download it, install it and off you go right?
No, there will be Hadoop as Service.

Your Hadoop project will fail because of you
I am sorry, Map Reduce is finished, Cloudera anounced end of support, dont talk about this. There are in-memory solutions.

Think about Hadoop as about operation system.

Steve Jones said...

I completely agree that DESPITE the leadership of Java its still the dominant platform. I have seen Java 8 and its still a 'kitchen sink' approach. Do Hadoop implementations require MIDI support?

Hadoop is a great substrate but its not the whole answer, its all about having a good architecture that enables you to make flexible choices. Hadoop as a Service works for some people, but not for all and the challenges of data gravity are a good reason that not everyone will want to be able to consume it as a service on remote clouds. The reason for point out the Pivotal piece is that with their Cloudfoundry they give companies that flexibility.

Anonymous said...

I so much agree with the overall article and the comment by the first person. The Hadoop saga reminds me of the EJB saga of years before. There was a ton of hype and eventually it failed and was replaced by popular frameworks like spring etc.

Now even key people from Cloudera are recommending not to use map reduce and move to Hadoop 2.0. If we were to learn from history, it is a matter of time where a Spring equivalent of Hadoop will prevail. Until then the vendors will continue to milk the cow with their bait and switch sales and marketing programs.

Steve Jones said...

I agree around Hadoop that wrappers are important and that people shouldn't be doing Map Reduce, my point on the 6 is that people are still doing Map Reduce in the mistaken belief that

a) its easy (it isn't)
b) its a BI thing (it isn't)

Thierry Hubert said...

I am in agreement with your assessment. It is important that businesses understand, and they should if they know what they are doing, that Hadoop is a complementary data OS, and that they still need RDBMS. I think that if most IT and business decisions-makerts should watch Amr'a Awadallah introduction of Hadoop at Stanford University http://www.youtube.com/watch?v=d2xeNpfzsYI

lobotus said...

Nice information, it is really helpful for those who are looking for information on bigdata and hadoop.