So I wrote about why your Hadoop project will fail so I think its only right that I should follow up with some things that you can do to actually make the Big Data project you take on succeed. The first thing you need to do is stop trying to make 'Big Data' succeed and instead start focusing on how you educate the business on the value of information and then work out how to deliver new value... that just so happens to be delivered with Big or Fast Data technologies
Don't try and change the business
The first thing is to stop trying to see technology as being a goal in itself and complaining when the business doesn't recognise that your 'magic' technology is the most important thing in the world. So find out how the business works, look at how people actually work day to day and see how you can improve that.
Sounds simple? Well the good news is that it is, but it means you need to forget about technology until you know how the business works.
Explain why Information Matters
The next bit after you've understood the business better is to explain to them why they should care about information. Digitization is the buzzword you need to learn, folks like MIT Sloan (Customer Facing Digitization), Harvard Business Review (are you ready for Digitization?) and Davos (Digitization and Growth) are saying that this is the way forwards. And what is Digitization? In the raw its just about converting stuff into digital formats, but the reality is that what its about is having an information and analytical driven business. The prediction of all the business schools is that companies that do this will out perform their competition.
This is an important step, its about shifting Information from being a technology and IT conversation towards the business genuinely seeing information as a critical part of business growth. Its also about you as an IT professional learning how to communicate technology changes in the language the business wants to hear. They don't want to hear 'Hadoop' they want to hear 'Digitization'.
Find a problem that needs a new solution
The next key thing is finding a problem that isn't well served by your current environments. If you could solve a problem by just having a new report on an EDW then it really doesn't prove anything to use new technologies to do that in a more time consuming way. The good news is there are probably loads of problems out there not well served by your current environments. From volume challenges around sensor data, click stream through to real-time analytics, predictive analytics through data discovery and ad-hoc information solutions there are lots of business problems.
Find that problem, find the person or group in the business that cares about having that problem solves and be clear about what the benefits of solving that problem are.
Get people with the 'scars and ribbons'
What do I do when I work with a new technology? Two things, firstly I get some training and from that build something for myself that helps me learn. If I'm doing it at work in building a business I then go and find someone who has already done this before and hire them or transfer them into my team.
Bill Joy once said that the smartest people weren't at Sun so they should learn from outside. I'm not Bill Joy, you aren't Bill Joy, so we can certainly learn from outside. Whether this means going to a consultancy who has done it before, hiring people in who have done it before doesn't really matter. The point is that unless you really are revolutionising the IT market you are doing something that someone has done before, so your best bet is to learn from their example.
It stuns me how many people embark on complex IT projects having never used the technology before and are then surprised that the project fails. Get people with the 'scars and ribbons' who can tell you what not to do which is massively more important than what to do.
Throw out some of your old Data Warehouse thinking
The next bit is something that you need to forget, a cherished truth that no longer holds. Get rid of the notion that your job as a data architect is to dictate a single view to the business. Get rid of the thought that the cherished ETL process. Land the data in Hadoop, all the data you can, don't worry if you don't think you might not use it, you are landing it Hadoop and then turning into the views or analytics. There is no benefit in not taking everything across and lots of benefits for doing so.
In other words you've got the problem, that is the goal, now go and collect all the data but not worry about the full A-Z straight away by defining Z and working backwards. Understand the data areas, drop that into Hadoop and then worry about what the right A-Z is today knowing that if its a different route tomorrow you've got the data ready to go without updating the integration.
Then if you have another problem that needs access to the same data don't automatically try and make one solution do two things. Its perfectly ok to create a second solution to solve that problem on top of Hadoop. You don't need everyone to agree on a single schema, you just need to be able to solve the problem. The point here is that to get different end-results you need to start thinking differently.
Don't get hung up on NoSQL, don't get hung up on Hadoop
The final thing is the dirty secret of the Hadoop world that has rapidly become the bold proclamation - NoSQL really isn't for everyone and SQL is perfectly good for lots of cases. Hive, Impala, HAWQ are all addressing exactly that challenge, and you shouldn't limit yourself to Hadoop friendly approaches, if the right way is to push it to your existing data warehouse from Hadoop... do it. If the requirement is to have some fast data processing then do that.
The point here is your goal is to show how the new technologies are more flexible and better able to adapt to the business and how the new IT approach is to match what the business wants not to try and force an EDW onto it every time.
The point here is that making your Big Data program succeed is actually about having the business care about the value that information brings and then fitting your approach to match what the business wants to achieve.
The business is your customer, time do do what they want, not force an EDW down their throats.
Don't try and change the business
The first thing is to stop trying to see technology as being a goal in itself and complaining when the business doesn't recognise that your 'magic' technology is the most important thing in the world. So find out how the business works, look at how people actually work day to day and see how you can improve that.
Sounds simple? Well the good news is that it is, but it means you need to forget about technology until you know how the business works.
Explain why Information Matters
The next bit after you've understood the business better is to explain to them why they should care about information. Digitization is the buzzword you need to learn, folks like MIT Sloan (Customer Facing Digitization), Harvard Business Review (are you ready for Digitization?) and Davos (Digitization and Growth) are saying that this is the way forwards. And what is Digitization? In the raw its just about converting stuff into digital formats, but the reality is that what its about is having an information and analytical driven business. The prediction of all the business schools is that companies that do this will out perform their competition.
This is an important step, its about shifting Information from being a technology and IT conversation towards the business genuinely seeing information as a critical part of business growth. Its also about you as an IT professional learning how to communicate technology changes in the language the business wants to hear. They don't want to hear 'Hadoop' they want to hear 'Digitization'.
Find a problem that needs a new solution
The next key thing is finding a problem that isn't well served by your current environments. If you could solve a problem by just having a new report on an EDW then it really doesn't prove anything to use new technologies to do that in a more time consuming way. The good news is there are probably loads of problems out there not well served by your current environments. From volume challenges around sensor data, click stream through to real-time analytics, predictive analytics through data discovery and ad-hoc information solutions there are lots of business problems.
Find that problem, find the person or group in the business that cares about having that problem solves and be clear about what the benefits of solving that problem are.
Get people with the 'scars and ribbons'
What do I do when I work with a new technology? Two things, firstly I get some training and from that build something for myself that helps me learn. If I'm doing it at work in building a business I then go and find someone who has already done this before and hire them or transfer them into my team.
Bill Joy once said that the smartest people weren't at Sun so they should learn from outside. I'm not Bill Joy, you aren't Bill Joy, so we can certainly learn from outside. Whether this means going to a consultancy who has done it before, hiring people in who have done it before doesn't really matter. The point is that unless you really are revolutionising the IT market you are doing something that someone has done before, so your best bet is to learn from their example.
It stuns me how many people embark on complex IT projects having never used the technology before and are then surprised that the project fails. Get people with the 'scars and ribbons' who can tell you what not to do which is massively more important than what to do.
Throw out some of your old Data Warehouse thinking
The next bit is something that you need to forget, a cherished truth that no longer holds. Get rid of the notion that your job as a data architect is to dictate a single view to the business. Get rid of the thought that the cherished ETL process. Land the data in Hadoop, all the data you can, don't worry if you don't think you might not use it, you are landing it Hadoop and then turning into the views or analytics. There is no benefit in not taking everything across and lots of benefits for doing so.
In other words you've got the problem, that is the goal, now go and collect all the data but not worry about the full A-Z straight away by defining Z and working backwards. Understand the data areas, drop that into Hadoop and then worry about what the right A-Z is today knowing that if its a different route tomorrow you've got the data ready to go without updating the integration.
Then if you have another problem that needs access to the same data don't automatically try and make one solution do two things. Its perfectly ok to create a second solution to solve that problem on top of Hadoop. You don't need everyone to agree on a single schema, you just need to be able to solve the problem. The point here is that to get different end-results you need to start thinking differently.
Don't get hung up on NoSQL, don't get hung up on Hadoop
The final thing is the dirty secret of the Hadoop world that has rapidly become the bold proclamation - NoSQL really isn't for everyone and SQL is perfectly good for lots of cases. Hive, Impala, HAWQ are all addressing exactly that challenge, and you shouldn't limit yourself to Hadoop friendly approaches, if the right way is to push it to your existing data warehouse from Hadoop... do it. If the requirement is to have some fast data processing then do that.
The point here is your goal is to show how the new technologies are more flexible and better able to adapt to the business and how the new IT approach is to match what the business wants not to try and force an EDW onto it every time.
The point here is that making your Big Data program succeed is actually about having the business care about the value that information brings and then fitting your approach to match what the business wants to achieve.
The business is your customer, time do do what they want, not force an EDW down their throats.
No comments:
Post a Comment