Tuesday, May 27, 2014

MDM isn't about data quality its about collaboration

I'm going to state a sacrilegious position for a moment: the quality of data isn't a primary goal in Master Data Management

Now before the perfectly correct 'Garbage In, Garbage Out' statement let me explain.  Data Quality is certainly something that MDM can help with but its not actually the primary aim of MDM.

MDM is about enabling collaboration, collaboration is about the cross-reference

Why do you do an MDM project?  The answer is to join information between multiple systems and multiple parts of the organisation.  Its so the customer in SFDC is the same as the customer in SAP and in Oracle Financials and when that customer hits the website you know who they are.  Its so the sales person can see all the invoices, orders and other elements related to their customer.  Its so you can see how a product goes through the various parts of the R&D and supply chain processes and track it all the way.

If everything was in one big system with a single database then you wouldn't really need MDM you'd just need data quality to make sure the single record was a good one.  You need MDM because you are attempting to join across systems and business units.  So the real value from MDM is that cross reference that tells you who the customer is and where all the information about them lives in the various systems... even if you never clean any of it.

So this is how you sell MDM to the business, not about data quality which is a secondary benefit, but as something that will enable the business to better collaborate and function more effectively.

Sometimes Quality doesn't count

The reality is that total quality isn't always what the business wants, they know some data is dodgy so the question is how dodgy and knowing that when you use it to make decisions.  Lots of social media is amazingly poor quality, but taken in volume trends can be seen.  What makes it more valuable though is when you can enable that cross-reference between the high-quality and the lower quality so you can see the trends of your customers and products not just trends in noise.

Focus on collaboration, focus on the cross reference, quality will follow

So having said that Data Quality isn't a primary focus it is actually how you enable that pesky cross reference, but you do so only on the information that matters, the core information required for the cross reference.  Thus you get a higher quality core identification of the customer and everyone understands why they are doing it, the quality enables the cross reference which enables the collaboration.

If the business don't care about quality why do you?
Now once you have that quality core, a minimum set of attributes required to uniquely identify the customer, then often you want to expand that quality to more attributes but stop and think.  Have the business asked me? That is quite an important point.  You might think its an absolute disaster that a given attribute isn't used in a standard way, but it could be that no-one in the business gives a stuff, so tell them about the issue but let them decide if they want to spend the money making it better.  If they don't document that they don't so if they come back you can say 'great so lets re-prioritise it' which is much better than 'oh so I spent money doing something that doesn't matter'.

The more you federate the more collaboration matters
The reason that MDM matters is that more and more business is about collaboration, both internal and external, this means that the business value of MDM has really shifted from being about the data quality in reports to being an integral part of how a business works.  Data Quality isn't irrelevant in this world but its turned from being the goal of MDM to being a tool that helps enable the primary goal which is collaboration.  As the need to digitally collaborate with partners and customers increases so the business value of that MDM cross reference increases both in operations and as the bit that helps you link up all of those big data sources to create a global view.

MDM is the Rosetta Stone that enables people to collaborate, so focus on collaboration not quality. 

Thursday, May 22, 2014

Lipstick on the iceberg - why the local view matters for IT evolution

There is a massive amount of IT hype that is focused on what people see, its about the agile delivery of interfaces, about reporting, visualisation and interactional models.  If you could weight hype then it is quite clear that 95% of all IT is about this area.  Its why we need development teams working hand-in-hand with the business, its why animations and visualisation are massively important.

But here is the thing.  SAP, IBM and Oracle have huge businesses built around the opposite of that, around large transactional elements, things that sit at the backend and actually do the running of the business.  Is procurement something that needs the fancy UI?  I've written before about why procurement is meant to be hated so no that isn't an area where the hype matters.

What about running a power grid? Controlling an aeroplane?  Traffic management? Sure these things have some level of user interaction and often its important that its slick and effective.  But what percentage of the effort is about the user interface?  Less than 5%.  The statistics out there will show that over 80% of spend is on legacy and even the new spend is mainly on transactional elements.

This is where taking a Business SOA view can help, it starts putting boundaries and value around those legacy areas to help you build new more dynamic solutions.  But here is a bit of the dirty secret.

The business doesn't care that its a mess behind the scenes.... if you make it look pretty

Its a fact that people in IT appear regularly shocked at.  But again this is about the SOA Christmas, the business users care about what they interact with, about their view for their purposes. They don't care if its a mess for IT as long as you can deliver that view.

So in other words the hype has got it right, by putting Lipstick on the Iceberg and by hyping the Lipstick you are able to justify the wrapping and evolution of everything else.  Applying SOA approaches to Data is part of the way to enable that evolution and start delivering the local view.

The business doesn't care about the iceberg... as long as you make it look pretty for them. 

How to select a Hadoop distro - stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually use.

That means three things

  1. Performant and ANSI compliant SQL matters - if you aren't able to run traditional reporting package then you are making people change for no reason.  If you don't have an alternative then you aren't offering an answer
  2. Predictive analytics, statistical, machine learning and whatever else they want - this is the stuff that will actually be new to most people
  3. Reacting in real-time - and I mean FAST, not BI fast but ACTUALLY fast
The last one is about how you ingest data and then perform real time analytics which are able to incorporate forecasting information from Hadoop into real-time feedback that can be integrated into source systems.

So Hadoop and HDFS are actually the least important in your future, its critical but not important.  I've seen people spend ages looking at the innards rather than just getting on and actually solving problems.  Do you care what your mobile phone network looks like internally?  Do you care what the wiring back to the power station looks like?  HDFS is that for Data, its the critical substrate, something that needs to be there.  But where you should concentrate your efforts is on how it supports the business use cases above.

How does it support ANSI compliant SQL, how does it support your standard reporting packages.  How will you add new types of analytics, does it support the advanced analytics tools your business already successfully uses?  How does it enable real-time analytics and integration?  

Then of course its about how it works within your enterprise, so how does it work with data management tools, how does its monitoring fit in with your existing tools.  Basically is it a grown-up or a child in the information sand-pit.

Now this means its not really about the Hadoop or HDFS piece itself, its about the ecosystem of technologies into which it needs to integrate.  Otherwise its going to just be another silo of technologies that don't work well with everything else and ultimately doesn't deliver the value you need.