Thursday, December 05, 2013

How Business SOA thinking impacts data

Over the years I've written quite a bit about how SOA, when viewed as a tool for Business Architecture, can change some of the cherished beliefs in IT.  One of these was about how the Single Canonical Form was not for SOA and others have talked about how MDM and SOA collaborate to deliver a more flexible approach.  Central to all of these things has been that this has been about information and transactions 'in the now' the active flowing of information within and between businesses and how you make that more agile while ensuring such collaboration can be trusted.

Recently however my challenge has been around data, the post-transactional world, where things go after they've happened so people can do analytics on them.  This world has been the champion of the single canonical form, massive single schemas that aim to encompass everything, with the advantage over the operational world that things have happened so the constraints, while evident, are less of a daily impact.

The challenge of data however is that the gap between the post-transactional and the operational world has disappeared.  We've spent 30 years in IT creating a hard-wall between these ares, creating Data Warehouses which operate much like batch driven mainframes and where the idea of operational systems directly accessing them has been aggressively discouraged.  The problem is that the business doesn't see this separation.  They want to see analytics and its insight delivered back into the operational systems to help make better decisions.

So this got me thinking, why is it that in the SOA world and operational world its perfectly ok for local domains and businesses to have their own view on a piece of data, an invoice, a customer, a product, etc but when it comes to reporting they all need to agree?  Having spent a lot of time on MDM projects recently the answer was pretty simple:
They don't
With MDM the really valuable bit is the cross-reference, its the bit that enables collaboration.  The amount of standardisation required is actually pretty low.  If Sales has 100 attributes for Customer and Finance only 30 and in-fact it only takes 20 to uniquely identify the customer then its that 20 that really matter to drive the cross reference.  If there isn't any value in agreeing on the other attributes then why bother investing in it?  Getting agreement is hard enough without trying to do it in areas where the business fundamentally doesn't care.

This approach to MDM helps to create shorter more targeted programs, and programs that are really suited to enabling business collaboration.  You don't need to pass the full customer record, you just pass the ID.

So what does this combination of MDM and SOA mean for data, particularly as we want analytics to be integrated back into operations?
Data solutions should look more like Business SOA solutions and match the way the business works
In simple terms it means the sort of thinking that led to flexibly integrated SOA solutions should now be applied to Data.  Get rid of that single Schema, concentrate on having data served up in a way that matches the requirements of the business domains and concentrate governance on where its required to give global consistency and drive business collaboration.  That way you can ensure that the insights being created will be able to be managed in the same way as the operational systems.

With SOA the problem of people building applications 'in the bus' led me to propose a new architectural approach where you don't have one ESB that does everything but accept that different parts of the business will want their own local control.  The Business Service Bus concept was built around that and with the folks at IBM, SAP, Microsoft and Oracle all ensuring that pretty much everyone ends up with a couple of ESB type solutions its the sort of architecture I've seen work on multiple occasions.  That sort of approach is exactly what I now think applies to data.

The difference?

Well with data and analytical approaches you probably want to combine data from multiple sources, not just your own local information, fortunately new (Java) technologies such as Hadoop are changing the economics of storing data so instead of having to agree on schemas you can just land all of your corporate data into one environment and let those business users build business aligned analytics which sit within their domain, even if they are using information shared by others.  MDM allows that cross reference to happen in a managed way but a new business aligned approach removes the need for total agreement before anything can get done.

With Business SOA driven operations we had the ability to get all the operational data in real-time and aggregate at the BSB level if required, with Business SOA driven data approaches we can land all the information and then enable the flexibility.  By aligning both the operational and post-transactional models within a single consistent Business aligned approach we start doing what IT should be doing all along
Creating an IT estate that looks like the business, evolves like the business and that is costed in-line with the value it delivers.
Applying Business SOA thinking to data has been really interesting and what led to the Business Data Lake concept, its early days clearly but I really do believe that getting the operational and data worlds aligned to the business in a consistent way is going to be the way forwards.

This isn't a new and radical approach, its just applying what worked in operations to the data space and recognising that if the goal of analytics is to deliver insight back into operations then that insight needs to be aligned to the business operations from the start so it can adapt and change as the operational business requires.

The boundaries from the operational and post-transactional world have gone, the new boundaries are defined by the business and the governance in both areas is about enabling consistent collaboration.


reamon said...

Wasn't this "local view" aspect the reason Inmon promoted the notion of data marts? Gather the data from whereever, cleanse, summarize, etc. to the DWH, then use that to populate data marts which are tailored to a specific audience/need?

Steve Jones said...

The problem of going via the DWH to the mart is then the DWH schema needs to have the schema for all of the marts which makes it even slower to change.