Thursday, December 05, 2013

How Business SOA thinking impacts data

Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT.  One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and SOA collaborate to deliver a more flexible approach.  Central to all of these things has been that this has been about information and transactions 'in the now' the active flowing of information within and between businesses and how you make that more agile while ensuring such collaboration can be trusted.

Recently however my challenge has been around data, the post-transactional world, where things go after they've happened so people can do analytics on them.  This world has been the champion of the single canonical form, massive single schemas that aim to encompass everything, with the advantage over the operational world that things have happened so the constraints, while evident, are less of a daily impact.

The challenge of data however is that the gap between the post-transactional and the operational world has disappeared.  We've spent 30 years in IT creating a hard-wall between these ares, creating Data Warehouses which operate much like batch driven mainframes and where the idea of operational systems directly accessing them has been aggressively discouraged.  The problem is that the business doesn't see this separation.  They want to see analytics and its insight delivered back into the operational systems to help make better decisions.

So this got me thinking, why is it that in the SOA world and operational world its perfectly ok for local domains and businesses to have their own view on a piece of data, an invoice, a customer, a product, etc but when it comes to reporting they all need to agree?  Having spent a lot of time on MDM projects recently the answer was pretty simple:
They don't
With MDM the really valuable bit is the cross-reference, its the bit that enables collaboration.  The amount of standardisation required is actually pretty low.  If Sales has 100 attributes for Customer and Finance only 30 and in-fact it only takes 20 to uniquely identify the customer then its that 20 that really matter to drive the cross reference.  If there isn't any value in agreeing on the other attributes then why bother investing in it?  Getting agreement is hard enough without trying to do it in areas where the business fundamentally doesn't care.

This approach to MDM helps to create shorter more targeted programs, and programs that are really suited to enabling business collaboration.  You don't need to pass the full customer record, you just pass the ID.

So what does this combination of MDM and SOA mean for data, particularly as we want analytics to be integrated back into operations?
Data solutions should look more like Business SOA solutions and match the way the business works
In simple terms it means the sort of thinking that led to flexibly integrated SOA solutions should now be applied to Data.  Get rid of that single Schema, concentrate on having data served up in a way that matches the requirements of the business domains and concentrate governance on where its required to give global consistency and drive business collaboration.  That way you can ensure that the insights being created will be able to be managed in the same way as the operational systems.

With SOA the problem of people building applications 'in the bus' led me to propose a new architectural approach where you don't have one ESB that does everything but accept that different parts of the business will want their own local control.  The Business Service Bus concept was built around that and with the folks at IBM, SAP, Microsoft and Oracle all ensuring that pretty much everyone ends up with a couple of ESB type solutions its the sort of architecture I've seen work on multiple occasions.  That sort of approach is exactly what I now think applies to data.

The difference?

Well with data and analytical approaches you probably want to combine data from multiple sources, not just your own local information, fortunately new (Java) technologies such as Hadoop are changing the economics of storing data so instead of having to agree on schemas you can just land all of your corporate data into one environment and let those business users build business aligned analytics which sit within their domain, even if they are using information shared by others.  MDM allows that cross reference to happen in a managed way but a new business aligned approach removes the need for total agreement before anything can get done.

With Business SOA driven operations we had the ability to get all the operational data in real-time and aggregate at the BSB level if required, with Business SOA driven data approaches we can land all the information and then enable the flexibility.  By aligning both the operational and post-transactional models within a single consistent Business aligned approach we start doing what IT should be doing all along
Creating an IT estate that looks like the business, evolves like the business and that is costed in-line with the value it delivers.
Applying Business SOA thinking to data has been really interesting and what led to the Business Data Lake concept, its early days clearly but I really do believe that getting the operational and data worlds aligned to the business in a consistent way is going to be the way forwards.

This isn't a new and radical approach, its just applying what worked in operations to the data space and recognising that if the goal of analytics is to deliver insight back into operations then that insight needs to be aligned to the business operations from the start so it can adapt and change as the operational business requires.

The boundaries from the operational and post-transactional world have gone, the new boundaries are defined by the business and the governance in both areas is about enabling consistent collaboration.

Monday, December 02, 2013

The only V that counts in Big Data is Value

So what is Big Data?  Its Variety, Velocity, Volume right?  But what does that really mean?  Should I get loads of data and drop it into Hadoop, pull in anything I can lay my hands on and I'm now 'doing Big Data'?

Should I plug in my packet monitoring software and store in Hadoop and I'm doing Big Data?  Should I get as many different data sources and that means I'm doing Big Data?

The reality is that one thing hasn't changed and its the key driver for Big Data in the way that it should be in any IT program - Value.  What is the point of what you are doing?  Does the business care? If there isn't a point or the business doesn't care then why on earth are you doing it?

Stop focusing on the 3 Vs of Big Data and worry only about the fourth, the one that really matters

Value.

Monday, November 04, 2013

Zuckerberg and the unreality of valley thinking

Bill Gates laid into Mark Zuckerberg's vision of internet access being the most important thing in solving the world's ills and another article at The Register compared Silicon Valley to 'Sheldonville' which I really agree with.  At the heart of lots of these 'visions' is an insular view of technology within Silicon Valley.

How insular?  Lets say Blue Ridge Mountains style insular.  The normal vision is that valley thinking will change the world if only people would just use that technology.  We saw it in the .com bust with companies that preached a huge vision on how they would replace the normal bricks and mortar and how everything would be changed and replaced....

And guess what?  It takes a hell of a lot longer to change culture than a VC/IPO cycle would like the end result is that the companies fail.  The VC/IPO money does however distort the market, look at Amazon's 'profit' statements against a food retailer - Wholefoods - and a failing retailer - Best Buy.


Revenue Profit
Amazon $21.27bn $97m
Wholefoods $2.2bn $113m
Best Buy $16.7bn $1.9bn

Now I know that Amazon aren't strictly a valley company but its the same sort of mentality, I'll come to some great examples of valley mentality in a second.  So Amazon is a great company doing some really cool and innovative things but thanks to their special technical sauce they are allowed to significantly undercut their competition because they don't need to make as much profit to keep shareholders happy.

That is a big advantage and one that distorts markets.

Now how about some other valley thinking examples?
  • Jonathan Schwartz and the over optimism of 'Open Source will defeat all' and pitching to financial analysts about the number of downloads the company had is a good one.  It ultimately doomed Sun to acquisition and really was a great example of valley thinking.
  • The startup I met whose strategy was 'rip out SAP' in order to use their software.
  • The integration of JAX-WS and Javascript into Java
  • The whole .com 'boom' based on 'visitor' numbers
  • The whole social media 'boom' based on 'user' numbers
  • Hell every single startup whose philosophy is that companies will rip out working technology to replace it with their new shiny and unproven technology
The valley often does come up with good stuff, but does valley thinking overall make the world a better place?  Sure the valley came up with Java.... and then the valley screwed it up.  The valley came up with REST... oh brilliant our lives are so much better.

My point of this rant is that software investment is driven by the thinking in a short stretch of land, some of it can be good, some of it can be bad, but when you look at companies like IBM, SAP, ARM, etc its quite amazing just how positive an impact they've had on technology, and arguably the world, without being driven at any stage by massive hype cycles.  I'd argue its exactly because these folks are mainly out of the valley that the often deliver practical innovations and improvements rather than impractical visions.

Zuckerberg's 'vision' for helping the world is just another example of that and Bill Gates is spot on about what is really important.  The valley is a great place, its a buzzing place, but its as much to do with the majority of technology decisions as Fantasy Football has to do with the NFL, sure they are related but one of them is doing real work, even if lots of money is going into the former, its the latter that tends to focus on profits.

Tuesday, October 22, 2013

Quality is a side-effect not the goal in MDM

I put out a tweet the other day with this title and I think its worth elaborating on what I mean.  Lots of MDM efforts I see have the goal of 'improve data quality' and this is a mistake.  I'm not saying that data quality isn't a good thing but that in itself its not actually a goal.

What do I mean?  Well lets take an analogy or three, if you are looking to buy a diamond then do you buy the very, very best and the very very biggest?  If you are looking for diamonds to use in cutting or industrial grinding then the answer is of course 'no', the quality really wouldn't be appropriate in those uses it would be a waste of money.  What if you are looking to put a 1.6 litre engine in a car aimed at the local commuter market, do you look around for the most powerful, most expensive, built to the highest quality standards?  Well that would probably be one of the engines slated to go into a Formula 1 car next season.  Sure it generates huge power and is a quality engine but its not fit for purpose.

Now for the final analogy.  You are looking to provide translation services for the Iranian nuclear discussions.  Should you go and get the cheapest price from someone who promises they can speak 'Iranian' or do you invest from someone who actually is proven as a translator for Persian, Gilaki and Mazandarani and describes their Kurdish as 'passable'?

The point here is that in each case the goal defines the level of quality required, quality in itself is about having an acceptable level of quality to meet your goal which in some occasions might be very little indeed.

So what is the real goal of MDM?  Its about enabling business collaboration and communication the power of MDM is really in the cross-reference, the bit that means you know the customer in one division is the same as another and that the product they are buying is the same in two different countries.  If the quality is awful but the cross-reference works then in many occasions you don't need to invest more in quality unless there is a business reason to do so.  Most of the time that business reason is that you cannot achieve the collaboration without having a decent level of quality.  To match customers across business areas requires you to have an standard definition, so your customer on-boarding needs a certain level of rigour, your product definition needs to work to standards that are agreed across the business.

So in focusing on the collaboration, in focusing on where the business wants to collaborate you focus MDM and you focus where quality needs to be achieves.  Focusing on quality as a goal is a very IT centric thing, focusing on collaboration and through that enabling quality is a business thing.

And MDM is certainly a business thing. 

Friday, October 11, 2013

Single Canonical Fail

There are few things out there in IT more delusional than the Single Canonical Form, the idea that IT can define a super schema, a schema so complete, so pure that all will bow down before it.  Sheer idolatry.  Whether it is for integration or for Data Warehousing the reality is that a Single schema is never going to be ‘canonical’, different people have different perspective and its this very contention between business areas that actually drives the business forwards.  Sales obeys the rules from Finance in certain areas and in others is in open rebellion as the KPIs for Sales compete against the need for regulation in Finance.  To forecast correctly means rigor and repeatability, but anyone who phones up Sales with an open checkbook is going to find their order fulfilled despite the claims of a sales process.

At the heart of a Single Canonical Form is a simple premise ‘as long as everyone can agree’ it’s the sort of premise that is wonderfully naïve in its inception. The reality is sadly that such a simplistic view ignores local perceptions and attempts to force a straight-jacket upon the business by providing a single, almost Stalinistic, view upon them.  By starting with that beguiling premise IT sets out on a journey that can only end on failure.  The Sales, Finance and Operations teams all have local KPIs, different division and regions have different strategies and all may have a different view on how they sell and whom they sell to.  This does not mean the business is dysfunctional or wrong, it simply means that the business is complex and not constrained within a single view of what should be done.

The Single Canonical Form aims to achieve the unobtainable and by doing so creates its own downfall.  Because it doesn’t meet the objectives of everybody then individuals are forced to create their own local solutions as the agility of the single canonical form is relatively, or indeed astronomically, low.  The goal of a single canonical form is to create a single view on one of the most variable things in a business: the view on information.  One part of the business may require only 10 pieces of information about a customer, another 200, neither are right or wrong it is simply their own local information, they critical element is that an individual customer be recognized across the two, not that 210 attributes be agreed.  The same goes for invoices, orders, contacts and everything else: agree when it matters, don’t bother when it doesn’t.


The Single Canonical Form is a straight-jacket on the enterprise, it’s a dumb idea based on an unachievable idea.  Its time for IT to grown up and work differently.

Thursday, October 10, 2013

Speaking in Public isn't private - and the internet is a public space

With all the scandal around Edward Snowden I have to say I'm mostly in the camp of 'surely everyone knew that spying agencies spied on people?', but the most surprising is when it comes to the 'scandal' that they might be listening to our communications over the internet.

The internet?  The one created and originally funded by DARPA?  The open internet with the IP protocol that means packets are by default openly routed and unencrypted?

Or do you mean SMTP with its unencrypted openly routed emails?

Or HTTP with its unencrypted data?  Even HTTPS only encrypts the data, you can still openly find out what page someone was looking at in terms of the IP address just not the data being exchanged.

Seriously did anyone actually think that people are not watching this stuff?  When did this become a surprise?  I knew that 30 years ago people were spying on this.  Didn't we all have a .sigs back then that said 'Hello to my friends in Cheltenham and Langley'?

Agencies have spied on people, even allies in fact maybe ESPECIALLY allies for hundreds of years.  Its the whole point of funding spying agencies and the internet just makes that spying easier.  Its a public conversation that you are having on Twitter, Facebook or over email.  You might think its private but its really just shouting out of a window and public speech isn't something you should be surprised if its being tracked.

This isn't a question of 'only the innocent have nothing to fear' but its a question that actually by bringing this more into the open we risk it being used beyond its current scope of spies and into regular police forces and its that which would scare me more.  The whole risk is that spying becomes a mainstream police activity not something from specialist organisations whose primary focus is on genuine national threats, not someone forgetting to put the garbage bin out on a Tuesday night.

Spying is real, it has been for hundreds of years and its sadly got a place in a world of cyber and other terrorist threats.  That people have got upset about the tracking of information over public networks says much more about the lack of understanding of how the Internet works than of a big brother state that is completely new.