Showing posts with label enterprise. Show all posts
Showing posts with label enterprise. Show all posts

Thursday, May 22, 2014

How to select a Hadoop distro - stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually use.

That means three things

  1. Performant and ANSI compliant SQL matters - if you aren't able to run traditional reporting package then you are making people change for no reason.  If you don't have an alternative then you aren't offering an answer
  2. Predictive analytics, statistical, machine learning and whatever else they want - this is the stuff that will actually be new to most people
  3. Reacting in real-time - and I mean FAST, not BI fast but ACTUALLY fast
The last one is about how you ingest data and then perform real time analytics which are able to incorporate forecasting information from Hadoop into real-time feedback that can be integrated into source systems.

So Hadoop and HDFS are actually the least important in your future, its critical but not important.  I've seen people spend ages looking at the innards rather than just getting on and actually solving problems.  Do you care what your mobile phone network looks like internally?  Do you care what the wiring back to the power station looks like?  HDFS is that for Data, its the critical substrate, something that needs to be there.  But where you should concentrate your efforts is on how it supports the business use cases above.

How does it support ANSI compliant SQL, how does it support your standard reporting packages.  How will you add new types of analytics, does it support the advanced analytics tools your business already successfully uses?  How does it enable real-time analytics and integration?  

Then of course its about how it works within your enterprise, so how does it work with data management tools, how does its monitoring fit in with your existing tools.  Basically is it a grown-up or a child in the information sand-pit.

Now this means its not really about the Hadoop or HDFS piece itself, its about the ecosystem of technologies into which it needs to integrate.  Otherwise its going to just be another silo of technologies that don't work well with everything else and ultimately doesn't deliver the value you need.

Thursday, January 16, 2014

The People's Democratic Republic of IT

IT is a communist state in many organisations, one that believes in rigid adherence to inflexible approaches despite clear indications that they inhibit growth and a central approach to planning that Mao and Stalin would have thought is taking things a little too far. This really doesn't make sense in the capitalistic world of business and the counter-revolution is well under way. Its


I don't think the word 'Enterprise' is really worth anything in terms of something being a single standard Enterprise approach.  Whether that is Enterprise Resource Planning, Enterprise Data Warehouse, Enterprise Service Bus or Enterprise Architecture you either end up with multiple solutions or a central solution that isn't used to the level it was envisaged so you get lots of solutions on the side.

Part of this is because in the capitalistic world of business it appears that communist style central planning has been, and remains, the normal approach.  This People's Democratic Republic of IT approach has two key parts to it
  1. IT knows best and will give everyone 'each according to their needs' and decide what those needs are.
  2. Cultish following of other communist plans, independent of whether the users want them.

The world of integration is a great example of the latter.  Do you know how much the business cares about whether you integrate two systems using REST, SOAP,sockets or flying monkeysZERO.  Hell probably even less than zero in that they have an active disinterest in it.  Yet in IT we don't take this as a guidance of 'its not important, lets commoditise the fuck out of it'.  Nope we continue to 'innovate' where it really doesn't matter and we do so because a whole heap of hype tells us to... business hype?  Of course not, its hype from people who think they've discovered the universal hammer that turns everything into a nail.

On the former its the realm of 'Enterprise Architecture' and EDWs that really underline just how much IT often resembles the politburo.  Here groups of worthy individuals set about on the business equivalent of the Cultural Revolution or Stalin's grand plan for agriculture.  They just know that if everyone would just work in the same way then everything would be so much better.  So off they trot pushing a single solution and historically this was pushed all the way through to production and the business went:
"Well its not what I wanted but its a bit less shit than what I've got"
So IT created grand strategic plans (and I've said before there is no such thing as IT strategy) often in areas that the business really didn't care and off the business went and started using DropBox, salesforce.com and Amazon.

In effect the Shadow IT efforts of the business are analogous to the black market economies that often thrived in communist countries in the 80s.  Getting on doing what they need to do and being a lot more efficient than the state in doing it.  What we are seeing today is that as budget shift more and more towards the business the shadow IT market is getting bigger and bigger and the central planning has suddenly hit an issue.
The business understands technology
 Maybe not in the depth that IT does, but what the business understands is a bit more valuable
They understand how to focus on outcomes that add value, not technology hype.
So now as the Enterprise Architect says "you cannot do that, it is against our policy" the business says "stuff that for a game of soldiers, your policy doesn't work for us.".  The business is having its Berlin Wall moment, and while the IT communist state, the People's Democratic Republic of IT (because communist states love claiming they are democratic) might hold on for a while the reality is that the world is beginning to come crashing down.

Its time for IT to embrace capitalism, embrace value over technology and outcomes over acronyms.

Tuesday, July 30, 2013

Surely REST isn't the travelling salesman does design?

Occasionally I run across things on the web, this time tweeted by Mark Baker (@distobj), that make me read them several times.  The link he tweeted is to this page on a Nokia API and in particular this section...
Biggest advantages noticed is that the API itself becomes the documentation of the API
  • Without HATEOAS developers would get data, go to a documentation, and figure out how to build the next request
  • With HATEOAS developers learn what are the available next actions
Either of these would get me firing you I think.  I'm a big fan of documentation and I'm a big fan of design.  What I'm not a big fan of is people who treat the design process as a series of abstract steps when the only thing that matters is the next step.

Lets be clear, having clear documentation available is important.  What would be poor would be two things:
  1. Having to wait for someone to build an application before I can start writing a client for it
  2. Having documentation that is some sort of 'mazy of twisty passages' where you only see one step ahead
This to me is at the heart of the death of design in IT, the lauding of everything as architecture and the massive chasm that appears to be developing between that and writing the code.  I'm repeatedly seeing approaches which are code centric rather than design centric and the entire history of IT tells us that this isn't the best way forwards.  Don't try me on the 'I just refactor' line as if that is an answer, spending 1 day thinking about a problem and planning out the solution (design) is worth 5 days of coding and 20 days of subsequent refactoring.

I'd really like a developer to be able to map out what they want to do, be able to read the documentation in one go and understand how they are going to deliver on the design.  I don't want a developer context switching between API, code and pseudo-design all the time and getting buried in the details.

This is part of my objection to REST, the lack of that up-front work before implementation - if I have separate client and service teams I don't want to run a waterfall project of 'service finishes, start client' and if those are two separate firms I want to be able to verify which one has screwed up rather than a continual cycle of 'its in the call response' and 'it was different yesterday'.  In other words I'd like people to plan for outcomes.  This doesn't mean not using REST for implementation it just means that the design stage is still important and documentation of the interface is an exchangeable artefact.  Now if the answer is that you have a Mock of the API up front and a webcrawler can extract the API documentation into a whole coherent model then all well and good.

Because the alternative is the travelling salesman problem.  If I don't know the route to get things done and am making a decision on the quickest way one node at a time then I'm really not looking at the efficiency of the end-to-end implementation just the easiest way to do the next step.

This current trend of code centric thinking is retarding enterprise IT and impacting the maintainability of REST (and other) solutions.  This isn't a question of there being a magic new way of working that means design isn't important (there isn't) its simply a question of design being continually undermined and discarded as an important part of the process.  Both of the scenarios outlined in the article are bad, neither represents good practice.  Choosing whether your manure sandwich is on a roll or a sub doesn't change the quality of the filling.

Think first, design first, publish first... then code.



Thursday, July 11, 2013

Google and Yahoo have it easy or why Hadoop is only part of the story

We hear lots and lots of hype at the moment around Hadoop, and it is a great technology approach, but there is also lots of talk about how this approach will win because Google and Yahoo are using it to manage their scale and thus this shows that their approach is going to win in traditional enterprises and other big data areas.

Lets be clear, I'm not saying Hadoop isn't a good answer for managing large amounts of information what I'm saying is that Hadoop is only part of the story and its arguably not the most important.  I'm also saying that Google and Yahoo have a really simple problem they are attempting to fix, in comparison with large scale enterprises and the industrial internet they've got it easy.  Sure they've got volume but what is the challenge?
  1. Gazillions of URIs and unstructured web pages
  2. Performant search
  3. Serving ads related to that search
I'm putting aside the gmails and google apps for a moment as those aren't part of this Hadoop piece, but I'd argue are, like Amazon, more appropriate reference points for enterprises looking at large scale.

So why do Google and Yahoo have it easy?

First off while its an unstructured data challenge this means that data quality isn't a challege they have to overcome.  If google serve you up a page when you search for 'Steve Jones' and you see the biology prof, sex pistols guitarist and Welsh model and you are looking for another Steve Jones you don't curse google because its the wrong person, you just start adding new terms to try and find the right one,  if Google slaps the wrong google+ profile on the results you just sigh and move on.  Google don't clear up the content.

Not worrying about data quality is just part of the not having to worry about master data and reference data challenge.  Google and Yahoo don't do any master data work or reference data work, they can't as their data sets are external.  This means they don't have to set up governance boards or operational process changes to take control of data, they don't need to get multiple stakeholders to agree on definitions and no regulator will call them to account if a search result isn't quite right.

So the first reason they have it easy is that they don't need to get people to agree.

The next reason is something that Google and Yahoo do know something about and that is performance, but here I'm not talking about search results I'm talking about transactions, the need to have a confirmed result.  Boring old things like atomic transactions and importantly the need to get back in a fast time.  Now clearly Google and Yahoo can do the speed part, but they have a wonderful advantage of not having to worry about the whole transactions stuff, sure they do email at a great scale and they can custom develop applications to within an inch of their life...  but that isn't the same as getting Siebel, SAP and old Baan system and three different SOA and EAI technologies working together.  Again there is the governance challenge and there is the 'not invented here' challenge that you can't ignore.  If SAP doesn't work the way you want... well you could waste time customising it but you are better off working to what SAP does instead.

The final reason that Google and Yahoo have it easy is talent and support.  Hadoop is great, but as I've said before companies have a Hadoop Hump problem and this is completely different to the talent engines at Google and Yahoo.  Both pride themselves on the talent they hire and that is great, but they also pay top whack and have interesting work to keep people engaged.  Enterprises just don't have that luxury, or more simply they just don't have the value to hire stellar developers and then also have those stellar developers work in support.  When you are continually tuning and improving apps like Google that makes sense, when you tend to deliver into production and hand over to a support team it makes much less sense.

So there are certainly things that enterprises can learn from Google and Yahoo but it isn't true to say that all enterprises will go that way, enterprises have different challenges and some of them are arguably significantly harder than system performance as they impact culture.  So Hadoop is great, its a good tool but just because Google and Yahoo use it doesn't mean enterprises will adopt it in the same way or indeed that the model taken with Google and Yahoo is appropriate.  We've already seen NoSQL become more SQL in the Hadoop world and we'll continue to see more and more shifts away from the 'pure' Hadoop Map Reduce vision as Enterprises leverage the economies of scale but do so to solve a different set of challenges and crucially a different culture and value network.

Google and Yahoo are green field companies, built from the ground up by IT folks.  They have it easy in comparison to the folks trying to marshall 20 business divisions each with their own sales and manufacturing folks and 40 ERPs and 100+ other systems badly connected around the world.