Tuesday, October 14, 2014

Uber won't want drivers in the future

I'm an Uber user, its a great service outside of cities with decent public transport.  But I have been thinking about where they will justify the $17bn valuation and give people a return on that $1.2bn investment.  At the same time I've been following the autonomous car pieces with interest and I think there is a pretty clear way this can end, especially as Uber have already said they are going to buy 2,500 Google cars.

If Uber do push local taxi firms to the wall leveraging that huge cash pile then it gives them a great 'front-window' in terms of the services they offer.  Now a the moment part of the beauty of the model is that they don't need to pay benefits to the drivers.  But what if you could get rid of the drivers?  Suddenly it made me think of the net impact of this.  Firstly with a driverless bus you could look at ad-hoc public transport style pricing.  Change the app so you put in where you are and where you want to go to and Uber can start ride-sharing to maximise utilisation of larger vehicles.  This reduces the cost to the consumer and also increases the profitability.  It also would put Uber into direct competition with the public transport systems in many countries, again leveraging that $1.2bn war chest to initially under-price and building on the political climates in some countries that see private transportation as the best way to go.

This all leaves Uber drivers being squeezed out.  Right now Uber take a 20% cut, but don't have the risk around the driver and the car.  But with electric autonomous vehicles much of that risk is going to be reduced which enables Uber to start offering UberAuto where they use some of that war chest to buy electric autonomous vehicles and have them compete in the UberX type of market, potentially directly with Uber X drivers or at a lower price as Uber could take the full revenue from the fare.  A franchise model where people buy 'official' Uber autonomous vehicles and have them added to the pool would spread the risk but essentially change the model away from human drivers.

To get this position of strength though what Uber really needs today is to push taxi firms out of the way so they are see as the obvious first place to get a ride in a city.  They are already doing that with their pricing, and having that $1.2bn means they are far from being a little startup company.  They are a leveraged market approach where the war chest gives them the ability to out compete local competitors.  Uber are doing this by leveraging cheap (to them) labour from the 'shared economy', or in reality are creating a business based around zero hours contracts and no benefits.

Longer term though its hard to see why Uber would keep sticking with people as autonomous vehicles become more cost effective.  By cutting out drivers they reduce a degree of risk and also increase the share of revenue that they can take.  Uber are clearly thinking this way with the purchase of the Google cars and its something we can expect to see increase over time.  This shift will bring Uber into the local public transport market where it can provide more flexible routing (at a price) than traditional bus and train services, but still provide a degree of cost sharing.  The 101 in San Francisco would be a great example of a profitable 'bus' route for Uber where people are dropped and picked up from home and work but where 80%+ of the journey is shared.

When looking at Uber lots of people talk about the 'sharing economy' but my prediction is the future of Uber is as an autonomous vehicle fleet, the sharing economy (and that war chest) just positions it ready for that future and helps remove some of the competition so its a clearer market when the shift happens.

Thursday, August 07, 2014

Whistler, Microsoft and how far cloud has come

In six years Microsoft has come from almost zero corporate knowledge about how cloud computing works to it being an integral part of their strategy.  Sure back in early 2008 there were some pieces of Microsoft that knew about cloud but that really wasn't a corporate view it was what a very few people inside the company knew.

How do I know this? Well back in 2008 I was sitting on the top of a mountain with Simon Plant in Whistler.  The snow wasn't great that season but there are few places that I'd rather take a conference call.  The conference call was with Microsoft's licensing folks discussing how we can license their technology, SQL Server, Sharepoint etc on AWS.  It was a rather interesting conversation to say the least.

We were asking how they'd license for virtual machines, and how things like license portability worked in virtual environments.  A typical exchange would go something like

Simon: "So what we need is a virtual core price"
MSFT: "That will be the same as the physical core price"
Simon: "But its ok that it might move physical machines?"
MSFT: "As long as its less than once every 90 days yes"
Simon: "It could be more than that"
Me: "It could be every hour"
MSFT: "No problems, you'll just need to license every core it goes on"
Me: "We don't know what physical cores it runs on"
MSFT: "Why not?"
Simon: "Because its a cloud platform, we don't care about the physical boxes"

Then the conversation included one of the finest lines to ever come out of a software companies mouth

"Well to be safe you just need to ask Amazon how many cores they have in the Data Centre and license for that"

The reason I re-tell this story is to make the point at just how far we've come in 6 years.  I don't think any licensing person would suggest today that you'd need to license for every physical core in an entire data centre.  There really wasn't an understanding that we couldn't just ask Amazon for its core count in every data centre or that we didn't even know physically lived, the bit they really couldn't get was the idea that we didn't care and not knowing those things was actually a positive.

The call continued and by the end we were actually getting somewhere with a general acceptance that physical to virtual licensing needed some wording changes to get it working on AWS.  The Microsoft guys were pretty receptive and keen to learn but it was clearly a new set of concepts for them.

Then Mr Plant blew their mind

Simon: "What about scale down?"
MSFT: "What do you mean?"
Simon: "Well the point of cloud is to scale up and down, so what do we do when we scale down?"
MSFT: "You just need to license at peak usage"
Me: "But that destroys the whole idea of dynamic scaling"
MSFT: "Why?"
Simon: "Well if you scale once a year for a peak for a couple days, say for financial reporting, the rest of the year that just remains idle which is wasted money"

The concept of temporary licenses and dynamic scaling was clearly one that went way beyond what they were able to do at that stage.  There were more conversations then explaining about what cloud really meant and the sorts of things customers would be asking them for in years to come.  This whole call took place with Simon at 12,000ft and him about 12 feet further up the mountain so we wouldn't get interference.  The Microsoft team commented that we appeared very co-ordinated given we were dialing in from UK and US numbers and we just didn't think saying 'actually we are sitting with snowboards on our feet' was terribly professional.

The above conversation was repeated with pretty much every single software vendor over about 3 months with the same misunderstanding and same suggestions of 'license the whole data centre', I'm just singling out the Microsoft example as they are probably now one of the biggest proponents of cloud and it sits at the core of their strategy... oh and doing a conference call on a snowboard was cool.

Six years is what its taken to go from there to here, a world where cloud is now practically the default approach, whether public, private or hybrid and those questioning cloud are effectively the uneducated minority, just as Microsoft were back in 2008.  Now the challenge for enterprises is understanding just how they take on these challenges at enterprise scale, and that is what Simon has been doing since then, leading to him setting up his own business Dual Spark which specialises in exactly that.

Simon Plant: doing cloud computing for longer than Microsoft.


Monday, July 14, 2014

Big Data doom mongers need to look outside of the marketing department

In every change there are hype machines that over play and sages who call doom.  Into the Big Data arena steps David Searls to proclaim that Big Data is a myth and simply hype which is set to burst in an article over at ZDNet.
But big data, he said, is nothing more than the myth that collecting vast amounts of data can help companies know customers better than those customers even know themselves.
 The boogie men in this story are IBM and the consultants who have hyped it all up.  There another sage jumps in
Dr Matthew Landauer, co-founder of OpenAustralia, is equally sceptical about big data. "All it allows you to do is optimise your current business," he told ZDNet. "It's never going to tell you that you're doing business wrong or need another model.
There then moves forward a complaint about privacy and security (which I'm not disputing) but the key point is that Big Data is a bubble, and I really have to disagree with the definition of big data, the lack of innovation it can drive and that its a bubble.

Firstly I don't agree that marketing and customer data is what Big Data is about.  The vast majority of my conversations on Big Data have nothing to do with an explosion of customer information (e.g social media) but are instead about machine data, trading data, weather data and other massive data sets that historically companies couldn't cost effectively do analytics on.

Social Media and customer information is just one part of the challenge and its probably the most fluffy bunny and liable to be a bubble but to infer from there that Big Data is just hype is like assuming all swans are white because you see a single white swan.

Secondly is the assertion is that it cannot tell you that you are doing things incorrectly.  I'm not sure what sort of machine learning and other data science work Matthew Landauer has done but I am surprised that someone with a PhD in Physics from Cambridge hasn't seen examples from his own field at just how much change becoming data driven can deliver in terms of insight that causes disciplines,  companies and industries to make dramatic changes and have new approaches (the LHR at CERN for instance is quite clearly a Big Data application).  Finance is covered with examples where smarter algorithms identified that things were done wrong and that new ways would make more money.  By analysing and simulating you can absolutely find that there are new ways that can work significantly better.

Thirdly I disagree about who started the big data hype, IBM were far from being the leaders, that job goes to two industries.  Firstly the internet giants, Amazon, Google, eBay, Yahoo etc who created entire new business models based on information, and secondly on engineering companies who saw new business models based on information.  Sure the 'marketing/social media' has come to be the default story used by the lazy but that is far from saying that it is actually the story.

Big Data marketing might pop, that doesn't mean that Big Data is hype.  Saying so would be like claiming the failure of Association Football to become the dominant sport in North America means that Association Football is failing.  Big Data is already delivering benefits in engineering in particular and the challenges associated with the Internet of Things are not going to result in a reduction in information anytime soon.  Claiming that its all just hype doesn't help move the state of the now forwards and certainly doesn't serve all those use cases which really are Big Data challenges.

But then the roll of doom-mongering sage has never been to be fair and balanced, but instead to take a specific example and declare 'We're all doomed' or 'the end is nigh'.  Where would the book deals be in 'Big Data has many specific use cases but some vendors are using it to hype sales of their technology in places where it doesn't really add value' or to give it a book title 'Salesmen - not always looking out for your best interests'.

Friday, June 27, 2014

Open Source as religion - when the Bazaar becomes a Cathedral

The seminal book on Open Source development "Cathedral and the Bazaar" talks eloquently about the difference between commercial software development and open source development.  In the past few years however there has been another shift, a shift where companies are actively releasing their technology into Open Source as a competitive differentiation.  A claim of 'we are open' because the source code is open.

The selling point then is the number of 'committers' (developers) that the company has on the open source project, this being their selling point because it means they can get your bugs fixed quicker because they have the inside track.

The competition between vendors using exactly the same open source distribution then becomes a question of who is the 'purest' the the vision and who has the most bodies contributing to it.  If an external company takes that source and releases their own version they are not simply frowned upon then are actively prevented from engaging in contribution as this would dilute the corporate messaging of the commercial companies who first established or who mainly contribute today to that open source program.

This isn't an entirely new thing, we used to see it quite a bit with some of the Java pieces and some would argue its related to what Linus does with Linux.  There is however a very big difference.  In those previous cases it was normally a single individual who made the original release and its that individual who then managed that control.  In Linus' case he isn't the commercial arm behind any of these things.

Its natural for this to have happened in the Open Source community as its become a commercial competitive weapon but it does really mean that Open Source is ceasing to become that historical bazaar and is instead in many cases now simply a different cathedral into which rigid company approaches are applied.  Its extremely hard for companies that have locked down millions in VC funding to enable their core market message "we own the code" to be diluted as their Open Source project becomes popular as this would reduce their differentiation and thus their market multiple as they look to IPO.

Open Source remains a strong approach and one that gives companies a level of security if a company ever goes bust, in that the code is still available.  But its quite clear to me that the VC funding that has flooded into the space has really destroyed the previous ad-hoc bazaar approach and instead simply re-created the Cathedral approach but with an Open Source release management system.

Tuesday, May 27, 2014

MDM isn't about data quality its about collaboration

I'm going to state a sacrilegious position for a moment: the quality of data isn't a primary goal in Master Data Management

Now before the perfectly correct 'Garbage In, Garbage Out' statement let me explain.  Data Quality is certainly something that MDM can help with but its not actually the primary aim of MDM.

MDM is about enabling collaboration, collaboration is about the cross-reference

Why do you do an MDM project?  The answer is to join information between multiple systems and multiple parts of the organisation.  Its so the customer in SFDC is the same as the customer in SAP and in Oracle Financials and when that customer hits the website you know who they are.  Its so the sales person can see all the invoices, orders and other elements related to their customer.  Its so you can see how a product goes through the various parts of the R&D and supply chain processes and track it all the way.

If everything was in one big system with a single database then you wouldn't really need MDM you'd just need data quality to make sure the single record was a good one.  You need MDM because you are attempting to join across systems and business units.  So the real value from MDM is that cross reference that tells you who the customer is and where all the information about them lives in the various systems... even if you never clean any of it.

So this is how you sell MDM to the business, not about data quality which is a secondary benefit, but as something that will enable the business to better collaborate and function more effectively.

Sometimes Quality doesn't count

The reality is that total quality isn't always what the business wants, they know some data is dodgy so the question is how dodgy and knowing that when you use it to make decisions.  Lots of social media is amazingly poor quality, but taken in volume trends can be seen.  What makes it more valuable though is when you can enable that cross-reference between the high-quality and the lower quality so you can see the trends of your customers and products not just trends in noise.

Focus on collaboration, focus on the cross reference, quality will follow

So having said that Data Quality isn't a primary focus it is actually how you enable that pesky cross reference, but you do so only on the information that matters, the core information required for the cross reference.  Thus you get a higher quality core identification of the customer and everyone understands why they are doing it, the quality enables the cross reference which enables the collaboration.

If the business don't care about quality why do you?
Now once you have that quality core, a minimum set of attributes required to uniquely identify the customer, then often you want to expand that quality to more attributes but stop and think.  Have the business asked me? That is quite an important point.  You might think its an absolute disaster that a given attribute isn't used in a standard way, but it could be that no-one in the business gives a stuff, so tell them about the issue but let them decide if they want to spend the money making it better.  If they don't document that they don't so if they come back you can say 'great so lets re-prioritise it' which is much better than 'oh so I spent money doing something that doesn't matter'.

The more you federate the more collaboration matters
The reason that MDM matters is that more and more business is about collaboration, both internal and external, this means that the business value of MDM has really shifted from being about the data quality in reports to being an integral part of how a business works.  Data Quality isn't irrelevant in this world but its turned from being the goal of MDM to being a tool that helps enable the primary goal which is collaboration.  As the need to digitally collaborate with partners and customers increases so the business value of that MDM cross reference increases both in operations and as the bit that helps you link up all of those big data sources to create a global view.

MDM is the Rosetta Stone that enables people to collaborate, so focus on collaboration not quality. 

Thursday, May 22, 2014

Lipstick on the iceberg - why the local view matters for IT evolution

There is a massive amount of IT hype that is focused on what people see, its about the agile delivery of interfaces, about reporting, visualisation and interactional models.  If you could weight hype then it is quite clear that 95% of all IT is about this area.  Its why we need development teams working hand-in-hand with the business, its why animations and visualisation are massively important.

But here is the thing.  SAP, IBM and Oracle have huge businesses built around the opposite of that, around large transactional elements, things that sit at the backend and actually do the running of the business.  Is procurement something that needs the fancy UI?  I've written before about why procurement is meant to be hated so no that isn't an area where the hype matters.

What about running a power grid? Controlling an aeroplane?  Traffic management? Sure these things have some level of user interaction and often its important that its slick and effective.  But what percentage of the effort is about the user interface?  Less than 5%.  The statistics out there will show that over 80% of spend is on legacy and even the new spend is mainly on transactional elements.

This is where taking a Business SOA view can help, it starts putting boundaries and value around those legacy areas to help you build new more dynamic solutions.  But here is a bit of the dirty secret.

The business doesn't care that its a mess behind the scenes.... if you make it look pretty

Its a fact that people in IT appear regularly shocked at.  But again this is about the SOA Christmas, the business users care about what they interact with, about their view for their purposes. They don't care if its a mess for IT as long as you can deliver that view.

So in other words the hype has got it right, by putting Lipstick on the Iceberg and by hyping the Lipstick you are able to justify the wrapping and evolution of everything else.  Applying SOA approaches to Data is part of the way to enable that evolution and start delivering the local view.

The business doesn't care about the iceberg... as long as you make it look pretty for them. 

How to select a Hadoop distro - stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually use.

That means three things

  1. Performant and ANSI compliant SQL matters - if you aren't able to run traditional reporting package then you are making people change for no reason.  If you don't have an alternative then you aren't offering an answer
  2. Predictive analytics, statistical, machine learning and whatever else they want - this is the stuff that will actually be new to most people
  3. Reacting in real-time - and I mean FAST, not BI fast but ACTUALLY fast
The last one is about how you ingest data and then perform real time analytics which are able to incorporate forecasting information from Hadoop into real-time feedback that can be integrated into source systems.

So Hadoop and HDFS are actually the least important in your future, its critical but not important.  I've seen people spend ages looking at the innards rather than just getting on and actually solving problems.  Do you care what your mobile phone network looks like internally?  Do you care what the wiring back to the power station looks like?  HDFS is that for Data, its the critical substrate, something that needs to be there.  But where you should concentrate your efforts is on how it supports the business use cases above.

How does it support ANSI compliant SQL, how does it support your standard reporting packages.  How will you add new types of analytics, does it support the advanced analytics tools your business already successfully uses?  How does it enable real-time analytics and integration?  

Then of course its about how it works within your enterprise, so how does it work with data management tools, how does its monitoring fit in with your existing tools.  Basically is it a grown-up or a child in the information sand-pit.

Now this means its not really about the Hadoop or HDFS piece itself, its about the ecosystem of technologies into which it needs to integrate.  Otherwise its going to just be another silo of technologies that don't work well with everything else and ultimately doesn't deliver the value you need.

Thursday, April 24, 2014

Data Lakes will replace EDWs - a prediction

Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away.  The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop.  With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid approach rather than a simple choice between Hadoop or a Data Warehouse.  So taking the average of a few analysts figures we get a graph that looks like this
In other words 12 months ago there was no real prediction at hybrid architectures. Now however we see SAP talking about hybrid, IBM about DB2 and Hadoop and Teradata doing the same. This means we need to think about what that means.  What it means is that we'll see a switch between Traditional approaches and hybrid Data Lake centric architectures that will start now and accelerate rapidly.
My prediction therefore is that these Hybrid Data Lake architectures will rapidly become the 'new normal' in enterprise computing.  There will still be more people taking traditional approaches this year and next but the choice for people looking at this is whether they want to get on the old bus or the new bus.  This for me is analogous to what we saw around proprietary EAI against Java based EAI around the turn of the century.  People who chose the old school found themselves in a very bad place once the switch had happened.

What I'm also predicting is we will see a drop rather than a gain in 'pure' Hadoop projects as people look to incorporate Hadoop as a core part of an architecture rather than standalone HDFS silos.


Tuesday, March 25, 2014

Microservices is SOD all within SOA

Microservices is a Service Oriented Delivery approach, all within a Service Oriented Architecture context.
(Long Title ;)

Ok so a few more updates since the last time I wrote about Microservices and I think its worth just updating as it really is heavily underlining why Microservices is a Service Oriented Delivery approach that absolutely can fit within a Service Oriented Architecture.  Lets be clear it isn't a bad thing to be that as delivery is where the rubber hits the road.  But every section that is written further underlines why the 'its not SOA' argument falls very shallow.

Infrastructure Automation
Not going to disagree here as this is just standard practice.  I actually think this isn't far enough forward looking in terms of infrastructure. Technologies such as CloudFoundry, supported by IBM, SAP, HP, Rackspace, EMC, VMWare, Pivotal, etc, move beyond simply automating the infrastructure side towards actually automating deployment at the application level.

This is a really solid SOD tip, certainly something I'd recommend.  Using technologies such as CloudFoundry can really help here and move beyond infrastructure automation.

Design for Failure 
Again a good tip in a service oriented architecture, back in 2007 I actually wrote about the challenges of five 9's and one was about planning for failure in SOA.  Again this is a really solid piece of advice but I feel it over simplifies the challenge.  Sometimes a service has failed because its impossible for it to succeed, particularly the case if using external services.  So 'restarting' is only an option sometimes and its important to understand the types of scenarios that you'll need to handle.

Evolutionary Design
This one comes from the school of the obvious.  The example used talks of a website, the Guardian, that was built as a monolith and was rearchitected to microservices.  I too have such an example of a large airline website that was built as a monolith and was getting hard to maintain and was broken down into various services and boundaries and separately compilable elements, this was in 2004-2006 however so again this really isn't new.

Yes of course you should look at something that will evolve... again a good tip but not something that marks Microservices down as anything other than a Service Oriented Delivery approach.

The goal of a good service oriented architecture is to "look like the business, evolve like the business and be costed in line with the business value" and Microservices lays down some nice rules for implementing certain parts of an enterprise but those are best served by an honesty that its an implementation choice within a broader Service Oriented Architecture.  That doesn't devalue it in any way, just places it in the right context.

Microservices is SOD, all within the context of SOA - and yes this time I added the comma.

I'd also like to put out a hat-tip to Loraine Lawson who pointed me towards the excellent Fallacy poster with the note that the SOA side-bar is really a 'Fallacy Fallacy'. 

Tuesday, March 18, 2014

Microservices is SOA, for those who know what SOA is.

Ok so its started a bit of debate on Twitter and now there have been emails, but in the spirit of openness I thought I'd better blog.  Now its good that Martin has now added a side bar on SOA to his article on Microservices but that really makes it worse in many ways.  I'll get to that at the end but first off lets explain why Microservices is just another SOA implementation pattern.  Its SOD for SOA

Lets break down the key precepts of Microservices and compare them against the OASIS SOA Reference Model (published 2006)

Componentization via Services

There follows a long text that can be reduced to

"Service Oriented Architecture (SOA) is a paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. " 
There is an awful lot of words about deployment and process models in the Microservices piece but this single sentence at the start of the OASIS SOA RM is much more powerful because
  1. It means that IT and non-IT services can be represented in a single approach
  2. It immediately includes the major challenge - that of politics and ownership
OASIS then goes nicely into WTF actually is a service,
  • The capability to perform work for another 
  • The specification of the work offered for another 
  • The offer to perform work for another
Again this is not a tech requirement which means your architecture can actually start from a business perspective rather than process threads.  In Fowler's microservices textual description (no bullet points here) we have

  • services are out-of-process components who communicate
  • explicit component interface
  • ..... nothing on this bit 

OASIS then adds the kicker
in SOA, services are the mechanism by which needs and capabilities are brought together.
Remember that phrase 'mechanism' its going to be important....

Here is where Martin begins to get it wrong.  The OASIS SOA RM defines a capability as 
A real-world effect that a service provider is able to provide to a service consumer
The point here is the difference between a capability (the bit that does the work) and the service (the organising construct) is really important.  What we found when doing SOA in the wild for over a decade, and all the people on the OASIS SOA RM had lots of experience doing this, was that the organising framework was separate from the actions themselves.  The reason this is crucially important is that people started often making services where service = capability so you ended up with lots and lots of services (ummm if I was being insulting I'd call them microservices) here are some posts from 2006 and 2007 that explain why its really not great to confuse the two concepts and it made it into the SOA Antipatterns as well.  Now the actual text is pretty much ok, but again its lack of reference to the past and making a crucial mistake does not help people learn how things are evolving and what can be improved.

New?  Hell even I wrote a book which talked about how to model, manage and set up teams around this approach. The SOA Manifesto (2009) talks about key principles behind SOA (I still prefer the RM though) from a big group of practitioners.  The point here is that there are two problems, first the confusion of service and capabilities and secondly the lack of recognition of hierarchies importance in governance.

Products not Projects

Here it goes  beyond the OASIS SOA RM which doesn't talk about delivery models... however fortunately it goes beyond absolutely nothing that was said before.  A quick hit on Google just shows how many pieces there are in this space, some better than others and some products better than others but really this is not new.  I used to use the phrase 'Programs not projects' and always talked about assigning architects for the full lifecycle 'to make them accountable'.    Again its not that this statement is in anyway wrong, its just that its in no-way new.  We've known that this has helped for years but it has a significant issue: cost.

Lets say you have a service that requires some amazingly smart analytics, some hard core coding and some wonderful UI design.  You hire the very best, they cost a fortune and your service is awesome.  Do you really want those folks sitting around waiting for requirement changes?  No.  This is why for me the concentration is at the architecture and ownership level that consistency must be maintained, not at the developer level.  For companies where software development is their business you can include development but for most organisations in the real-world economics prevent the 'you built it you run it' mentality.  Again its a lack of learning that undermines the message.

Here I'm not disagreeing, it would be super ideal to have the same team always maintaining the code they write, its just not practical but thinking in terms of discreet pieces with their own heartbeats is a good thing.  Most decent SOA programs I've seen have recognised this and had different delivery schedules for different services.  What becomes important is the integrity of the service and its ability to be independently maintained, within the context of its management hierarchy, relegating this to 'keep the same developers' misses some of the power of SOA.  Again this emphasises that Microservices is just another SOD approach, a potentially good one in some circumstances but not something that will work for everyone and every circumstance...

Smart endpoints and dumb pipes

I really agree with this, but I think its better put in a different way.  In the OASIS SOA RM there is the concept of an 'execution context'.  Which is the bit that lets a service be called and its capability invoked.  Clearly the end-point is 'smart' as its what does the work, hence the phrase 'mechanism' used above.  The 'pipes' may or may not be dumb (the 'pipes' talking to those Rovers on Mars are pretty smart I think) but what they are is without value.  This was a crucial finding in SOA and is well documented in the SOA RM.    The execution context is where all of the internal plumbing is done but its the Service that provides the access to the capability and its the capability which delivers the value.

So its not wrong to say 'smart endpoints and dumb pipes' but its better to say that the end-points have the value and the pipes are just infrastructure, this gives a better guidance on why you shouldn't be focusing on the pipes.

Now it does make a good point about not having smarts in the information fabric, but again this isn't new or unusual in decent SOA implementations.  I collaborated with the folks from Progress Software back in 2007 on the Business Service Bus concept which is just about having mediation (security, basic transformation) in the Bus.  There are good reasons why these things go there, cross-referencing between different ontologies is one.  This also plays into the 'always proxy' pattern that most decent SOA folks did.

So its again not wrong, its just that its not new, and it further underlines the inherent SOA nature of Microservices.
This really is another bit that I really do agree with, I've talked about the People's Democratic Republic of IT for years at conferences and finally got around to blogging on it.  The whole principle of business driven SOA is that the governance model better matches the business.  So again I think its not bad advice its just that SOA gives so much more than Microservices in terms of governance.  SOA as described in the OASIS SOA RM allows these principles to be applied to all IT assets not just those implemented with a specific implementation style and indeed to just to IT assets meaning its an approach that business schools are teaching.
The last thing I'll cover here is how the SOA sidebar is really a red herring, its not a true definition of SOA instead rolling out the old 'big' ESB and WS-* trope that was so loved by the RESTafarians when explaining why their way was better.  The claim is that this article 'crisply' describes the Microservices style and thus is valid in comparison with SOA as 'SOA means so many things'.  This I fundamentally disagree with, firstly because Microservices would be better served as an implementation approach if it could explain how it fits with non-Microservices approaches, something that SOA does a great job of doing, and second that it can't even say what makes a service 'micro' with services ranging from decent sized (12 person) teams down to individuals.

Conclusion

This for me is why Microservices is just a Service Oriented Delivery approach for a well architected SOA solution.  SOA provides the contextual framework, provides most of the rules that Microservices aims to adhere to but more over gives a broader context within which Microservices fit within a complex enterprise.  Calling out WS-* for the one millionth time or 'big' ESB and talking about massively complex projects is simply a shot at a different challenge.

Additionally the fact that one of the references that is used is that of Netflix who actually use the term 'fine-grained SOA' as recognised in the footnotes sort of underlines the fact and the fact that another (Amazon) also says its SOA.

I think its great that SOA is now coming back to the fore in the market as the hype around the plumbing (WS-* v REST) dies down and that the learnings of companies who have been doing this for over 10 years is now being talked about.  But that is the way to talk about it: what is the state of the 'now' in Service Oriented implementation and architecture?

Wednesday, March 12, 2014

What is real-time? Depends on who you ask

"Real-time" its a word that gets thrown about a lot in IT and its worth documenting a few of the different ways it gets used

Hard Real-time
This is what Real-time Java was created to address (along with Soft Real-time) what is this?  Easiest way to say it is that often in Hard Real-time environments the following statement is true
If it doesn't finish in X milliseconds then people might die
So if you miss a deadline its a systems failure.  Deadlines don't have to be super small, they could be '120 seconds' but the point is that you cannot miss them.

Soft Real-time
This was another use case for RTJ, here there are deadlines but missing deadlines isn't fatal but does degrade the value of the result and results in degraded performance.  Again though we are talking about deadlines not performance
If it doesn't finish in X milliseconds the system risks failing and will be performing sub-optimally
Machine real-time
This is when two machines are communicating on a task or processes within a machine, here the answer is 'as fast as is possible' there aren't any deadlines but the times are always measured below the microsecond level.  These are calculations and communications that get done millions and billions of times so a shift from 0.1ms to 1ms over a billion attempts means that the end-to-end work takes just over a day against 11 days.  This is the world of fast and HPC where the communications and processes need to be slimmed for speed.
Every microsecond counts, slim-it, trim-it because we're going to do it a billion times
Transactional real-time
Transactional real-time is about what it says, the time to complete a transaction, something that hits the database and returns.  Here we are in the millisecond to a tenth of a second type of range,  its this number that determines how internally responsive an application is.  This is the end to end time from initiation to response and at that point the state of the system has been changed.
Don't make me wait for you
User transactional real-time
User transaction real-time is what looks fast to a user of a system, this varies from interactional systems where it means sub-second to internet solutions and web-sites where it might mean 5 seconds or so.  Again this is the end-to-end time and includes something actually having happened and being able to check that it has happened.
Be fast enough so the user thinks its magic
BI reporting real-time
Next up are the pieces that are the views on the recent reality the time it takes for a report to be generated from 'landed data'.  Here BI guys tend to think of real-time in the same way as User Transactional real-time, 5 or so seconds is therefore acceptable.  This isn't however an end-to-end time as the data is already landed and all that is happening is the reports being done quickly.  Crucially however this is about reporting, its not about having transactional systems hitting the reporting system.
Let the user do all the reports and variations they want, we can handle it and not annoy them
BI real-time
The next definition is for the end-to-end of BI, so the extracting of data from a source system, the loading and transformation of that data and finally a report being done which includes that new information.  Here for BI the real-time definition can get longer and longer.  I've had clients say that 5 minutes, 15 minutes or even 2 hours are considered acceptable 'real-time' responses.  The point here is that when the current response time is 'next day' then something that is only minutes or even a few hours delayed is a significant increase in performance and is considered responsive.
Tell me shortly what went wrong right now
Chuck Norris Real-Time
In Chuck Norris real-time its happened before you even knew that you wanted it to happen.
Problem solved 

Tuesday, March 11, 2014

Microservices - Money for old rope or re-badging SOA for the cool kids

Hat tip to John Evedemon for the heads up on this one.  Martin Fowler is peddling a new approach, 'Microservices' which... wait for it is a way of developing applications as a suite of services.  Each one of which has its own process thread and 'communicates via lightweight mechanisms' such as.... over HTTP.

But wait there is more, you'll be stunned to know that these services can be built using different programming languages and even use different data stores.

Now down in the footnotes it makes a reference to Netflix talking about 'fine grained SOA' so its there that we begin to get the sniff of an old idea wrapped up in some shiny new wrapping there are a few things you need to do when saying this.  The first is critical don't learn or reference previous approaches except negatively.
Most languages do not have a good mechanism for defining an explicit Published Interface.
Now I really wish there was an approach that enabled a Service to publish a definition of itself of the Web, a sort of Web Service Description, if only there was such a language I might call it... oh I don't know Web Service Description Language.  The rest of the article talks about things that were common discussions around 2001, making things independently deployable but recognising that interface changes can have knock on impacts.  Hell I could even think of an Interface Definition Language or IDL that might do that as well.

Martin Fowler really should know better than this in not paying any heed to what has gone before and promoting an approach as if its actually new.  The article reads like an extremely basic description of SOA from about 2000 without either the industrialisation of WS-* or the dynamic power of things like Jini.  Above all it doesn't move forwards the question of how to architect such service architectures and how they need to map to the business and how although there might be fine grained services it is critical to understand the management structure and hierarchy of those services to actually enable the degree of autonomy and flexibility required in these sorts of architectures.

Microservices is just another take on SOA and one that doesn't move the game forwards as it yet again focuses on technical aspects over the business architecture that is realised.  Putting forwards as new something that would be recognisable to people working on CORBA in the 90s and certainly Web Services in 2001 is just poor quality advice.  It neither learns from the past, educates the reader nor moves the game forwards, its just selling an old approach with a new name.

Microservices?  What is that SOA 3.0?  Nope its just an old school form of SOA without the learnings that came from doing that.
 

What are the types of Data Scientist?

There are various views going around on what a Data Scientist is and what their value is to an organisation and the salaries they command.  To me however asking 'what is a Data Scientist?' is like asking 'What is a Physicist?' sure 'someone who studies Physics' might be a factually accurate but pointless definition.  How does that separate someone who did Physics in High School from Albert Einstein?  How does it separate the folks at CERN from someone using implicit Newtonian mechanics to play pool?   So it is with Data Science, but with an added twist.
Data Science is spectacularly badly defined
So yes you have courses cropping up at universities claiming to teach Data Science, you have consultants who have some mildly fancy Excel spreadsheets claiming they are Data Scientists.  In my career I've had the pleasure of working with some real Data Scientists, quite a lot of the time they didn't call themselves that but its what they were.  They used Data and applied some really fancy maths to deliver insight that just couldn't be attained with out it.  So I'll pick up the challenge laid down by Giga on whether Data Science is real or not and say 'yes.... but' here are my four groups of Data Scientists that I've worked with.

Data Magicians or Professor Data Science
Arthur C. Clarke once said that any sufficiently advanced technology is indistinguishable from magic.  This is how I feel when I work with people in this group.  They normally have mathematical or physics centric PhDs (often several), often focused in specific areas such as fluid dynamics, economics or super specific such as wind-turbines.  Why is what they do Magic?  Because these are the folks who work on 'next' and it is not a big group but these are the CERN folks of Data Science.  The reason its science is because its testable and provable.  They can show that their algorithm would have produced 5% improvement in performance over the past 5 years, and as it moves forward show how their approach has made a difference to the performance of a business.

How to know they are doing real Data Science?  Well the first hint is the mathematics, its the stuff where you remember the symbols but the combination of them all together now looks like gibberish, and yet these folks are arguing over specific parts of the formula as to how it can be improved.  Its like watching developers arguing over the right way to handle machine to machine communication.  You know who is smart by the outcome and the focus on the specific not the general.  Being blunt however very few companies need these folks and if they do they need very few.  Working with external organisations who have a good eco-system or Data Science structure is going to be better than having a lone Data Magician wandering around getting bored.    New algorithm development is not a regular thing.

Data Operators or Resident Data Scientist
The next group is what lots of companies will see value in.  These aren't the Magicians or Professors but they are crucial to making Data Science have value.  These are the people who take predefined algorithms, statistical or machine learning, and then apply them to a specific company scenario and most crucially keep the parameters up to date so the algorithm continues to perform.  These are the operational side of Data Science, the people for whom its a regular day job.  They can't do the design of a new algorithm but they can deliver specific value with an existing one which is what really counts for a business.  These folks are adapt at choosing the right approach and choosing between algorithms to choose the right way to deliver the most value.

These are the people most companies need to be thinking about, people who can take libraries like MADlib, languages like R or tools like SAS and then apply it to your local challenge, deliver the value and provide the on-going support to keep it effective.  Companies need access to these people either internally or as part of an external service, but in a more Outsourcing/regular way than with the project driven Magicians.  These people will have a formal mathematical or physics background, often up to the PhD level, but their abilities are more focused to application than invention.  These guys understand the formulas however and that is how you know they are real.

Data Hackers or Odd Job Data Scientist
The next group are people with a bit of skill, maybe a bit of training, but they aren't at the level of sophistication of an operator and are miles away from being a Magician.  Sadly these people often don't know this.  These are folks who've take the tools, learnt a bit about how they work and most often have fixated on a specific way of solving a specific problem with specific tools.  These folks are out there today and aiming to get work by being 'the one eyed man in the kingdom of the blind'. These folks can cost only $30 an hour apparently.  These folks add limited value but can improve the current situation in the same way as a decent report writer could.  In fact there is much overlap between the report writer and these people.  That doesn't mean they have zero value but that you need to know what they are doing.

Using these folks under the guidance of a Resident Data Scientist can add value, but don't mistake knowing how to apply one Machine Learning technique for actual knowledge.  The folks in this group know how the tool works but don't understand the mathematics behind it.

Data Science Bluffers or MS Office Data Scientists
This is the last group, and I'd say the one most responsible for the idea that Data Science might not be a real thing.  These are people who put a 'predict future stock price movement' box on their diagrams and mutter 'its Data Science'.  They are the people who get a spreadsheet with a bunch of data, apply a very basic statistical function and claim 'Hey its Data Science'.  These are the bluffers of Data Science the people who are trying to play the one eyed man and hoping that no-one notices they have a blindfold on.

This is a big group right now, populated in a large part by consultancies often those that specialise in Excel spreadsheet type of work and now claiming that this is the Data Science insight that the company needs.  These guys understand neither Data Science tools, nor the mathematics behind them but they do know how to create a good deck and Excel sheet.


Still doubt its real?
Still wondering whether this sort of thing is real? Well I'll give you a problem to solve - Medium Term Conflict Analysis in Air Traffic Management that is a 'hard' problem and one that requires really complex Mathematics to understand all of the pieces at play in the sky to achieve effectively.   Here is another one, you have 50 suppliers, 500 stores, 20 distribution centres, 100,000 SKUs and information streaming in about sales, social insight and stock levels... how do you efficiently procure, ship and stock to maximise profits?  Don't forget to include cannibalisation between brands and the costs of wasted stock or stock-room space.

There are lots of other challenges where Data Science adds value but the point is that you need to understand what sort of Data Science you are trying to achieve.  For most people this isn't getting Data Magicians, its about getting Data Operations folks, the Resident Data Scientists.

So its real, but don't forget to challenge people who say they are a Data Scientist and work out which bucket they fall into

  

Friday, March 07, 2014

BI change is coming, time to get over it and get on with the job

One of the things that always stuns me in IT is how people don't appear to like change.  Whether it was the EAI folks pushing back on Web Services in 2000 in favour of their old-school approaches.  The package guys pushing back against SaaS or now the BI guys pushing back against the new wave of BI technologies and approaches the message is always the same:
We are happy doing what we are doing, its great, I've done it for years, I don't want to change
That really isn't a long term career plan in IT.  This is the industry where revolution happens on a regular basis.  That doesn't mean you shouldn't be a skeptic, everyone in IT should be a skeptic and really ask for new approaches to prove themselves.  Its why I still don't think REST is 'the answer' to integration, I just haven't seen it proven to work at scale in the enterprise.

Sometimes however you can see the wave coming.  The Internet was one obvious wave and its impact on the enterprise was huge.  Now however there is another wave that I feel is equally obvious.  The wall between operations and analytics, between transactional systems and BI systems is being smashed down.  Its a wall IT put there as its been cheaper and simpler for us to have it there, but the business doesn't want these worlds separate any more.

This means that change is inevitable.  This is a change in the information space which is analogous to the desktop PC impact over the previous mini and mainframe computer era.  The change is going to be large.

Its going to require

  • Joined up thinking between applications, transactions, processes, analytics and information
  • Its going to require thinking in the local context while enabling global governance
  • Its going to require new governance models that better match the business
  • Its going to require all speeds of analytics, and the ability to change the speed as the business demand evolves.
This is a big wave and you cannot stand Canute like on the beach wishing for it to go away.  Much better to embrace the change and start planning for it as that way you won't be washed away.

IT changes.... its why its useful and its why is an interesting business.  Right now BI is at the centre of one of the most exciting changes in 20 years, we should celebrate that and get on with making it happen because otherwise we will wake up in 5 years and find our company has either struggled to compete or fail, or has succeed despite of us.

Monday, March 03, 2014

The next big wave of IT is Software Development

I can smell a change coming, the last few years have seen cloud and SaaS on the rise and seen a fragmentation in application development (thanks in a large part to the appalling stewardship of Java) and a real focus of budgets around BI and 'vanilla' package approaches.  Now this is a good thing, both because I jumped out of the Java boat onto the BI boat a few years ago but also because its really helped shift the investment focus away from 'Big iT' towards 'Big It' - by which I mean the focus has shifted very firmly towards Information over technology.

Now however companies are waking up to the fact that as good as a Salesforce.com or SAP is it really doesn't matter if everyone in your industry is using it, these are not your differentiators.  Your differentiators are what matter in terms of accelerating growth and outperforming your competitors.

This is where software development comes back and I predict the following four waves
Big Data will drive the new Software Development explosion
Big Data is the hype today and it will be the foundation of this new era as information is the key, fast data will however rapidly become just, if not more, important than 'Big' which means that the ability to integrate into transactions and deliver differentiation will be critical.  This is why we'll see a resurgence in software development based on new approaches and new platforms but we'll see the same sort of consolidation phases that we saw around Java.  Maybe this wave will last 10 years, maybe 5, maybe 20 but what appears certain is that the wave is coming.

This isn't the older wave though, it never is in IT, its the same sort of thing but now applied to the next generation of problems.  This is about information and collaboration and digitization of organisations, its about taking all these vast amounts of internal and external information and driving better outcomes and crucially about collaborating between organisations.

Lets be clear: this is going to be hard.

Back with Java we had a Database, we had SQL, we had (eventually) an App Server... that my friends was a walk in the park compared with what is coming.  I'll write a new post shortly on what the new world could look like but suffice to say you need the following

1) An inherent understanding of what makes information work - Master Data, Big Data, Analytics
2) An understanding of how to drive impact - how to engage users
3) An understanding of how to work with people outside your organisation

You thought doing a Java project with offshore delivery and a bunch of legacy integration points was hard?  Hang on to your hats... 

Software Development Wave 4: back to the package

The end of the next Software Development wave will be when Software development against 'eats itself' as it did with with technologies like Hadoop showing a new value in information, with platforms like SFDC showing new pre-build services, where people like GoodData have turned BI into SaaS.  So we will see the same evolution again and a new generation of commoditisation which drives consolidation and cost saving to replace things that would differentiate today but will be a commodity in 5-10 years time.

This is what always happens.  SAP was created because people had written custom software for factories for years and they saw a market for a platform.  Siebel was born because people had built software to manage client contacts for years and Salesforce.com because it wasn't really that important that the software be unique to your business.

Software Development always leads to packages in the enterprise space because there are lots of companies doing similar things.  Master Data Management is a great example of an old problem in the information space which has been heavily commoditised by vendors in the last 10 years.  Hadoop has turned the challenge of handling massive data volumes into a file system.

Software development waves lead to a new wave of package solutions, the "buy v build" question is always asked and software development wins when the answer is 'there is nothing we can buy' and package wins when the answer is 'this does 90%+ of what we want for a fraction of the risk'. In between there are grey areas, which is why the transition is never clean, but the reality is that after this next wave of software development we should expect to see the next generation of package solutions.  I think however there is a good chance that these next ones will be markedly different to the current generation.

The next generation will be based around information and insight and less around repeatable processes.  The old packages will exist, like a geological epoch, but there will be less and less reason to change what is established in those areas, the new value will be built on their foundations, first by software development and then by the commoditisation and packaging of the new generation of solutions.

This is part of a series of posts on Why Software Development is the next big Wave and is preceded by Wave 1: The Personal DeveloperWave 2: The team Developer and Wave 3: the Enterprise Developer

Software Development Wave 3: the enterprise developer

This is the stage at which software development begins to commoditise itself, its no surprise that underneath all that Salesforce.com scripting lurked rather a lot of Java code.  This wave sees the rise of the libraries, the utilities and above all the commoditisation of software in a way that enables the majority of developers to be useful in the enterprise.  This was the goal of Spring, JEE and numerous other frameworks but also the aim of BPEL and other visual process approaches which leveraged the Java platform.  Now some may argue this wave hasn't crashed, but I'm sorry it has, the innovation in the core enterprise space in the last few years has been practically zero, the brains have been elsewhere even if the money hasn't.

The new wave is going to do much of the same as the old wave, including mistaking SOA for being about end-points rather than architecture.  This time however the orchestrations and work is going to be much more about straddling organisations and collaboration than just delivering internal solutions.  Having multiple companies working together on something is going to need some pretty fancy tooling and its my money that this space sees the next generation of enterprise software solutions that kick off the next wave of packaged software/SaaS focused investment in IT.

This is part of a series of posts on Why Software Development is the next big Wave and is preceded by Wave 1: The Personal Developer and Wave 2: The team Developer and followed by Wave 4: Back to the package

Software Development Wave 2 - the team developer

The problem with Wave 1 was that it didn't scale, I mean sure lots of the personal developers claimed it did scale, often laughing at large scale developments and going 'Me and four mates could do that in a couple of weeks' often they attempted to do that and suddenly realised that when you get a few people together it gets a bit more complicated and when that few gets over 20 it begins to get really, really complicated.  Back in the Java days this era saw the rise in the importance of design, but design focused on development rather than pictures, like with TogetherJ against the older school Rational products.

At the start of the 21st century this battle between the personal and the team developer was fought out on the floor of the JavaOne conference, '.com' champions of the personal style battled it out with a new breed of Java developers who were working for dull and boring corporations.  The noise and the hype in 2000, and indeed into 2001 given that not everyone had popped, was strongly on the side of the personal developers but the actual cash, rather than paper money, was on the side of the enterprise folks.

When the bubble had finally popped a new wave was about, sure Agile was kicking off but even in its most extreme of XP debates the talk was about quality, about testing, about making it maintainable.  Suddenly the battle wasn't between the personal and the team it was between what type of team dynamic was best.

Some organisations are beginning to hit this wave again, but I'd argue that it won't be the mainstream mentality for another couple of years.  What we'll see this time is a focus back on design but also a focus towards eliminating many of the things that made team dynamics so poor, things like deployment times, infrastructure management etc.  Cloud and now PaaS offers like CloudFoundry are changing the game here already but its still a jump for companies to move towards a next generation team development infrastructure.

This next generation will also see teams work between organisations, giving a whole new team dynamic challenge for us to overcome.  This means that the new generation is going to be less about processes and transactions (as the Java wave was) and more about information collaboration and digitization.   So while traditional design approaches such as UML might make a comeback I'd suggest that new approaches that concentrate around information-centric collaboration are going to be created and become more popular.

The challenge of the team developer era however was, and is, that its still for a minority.  The Java community between 2001 and 2003 was absolutely exploding but arguably the tools and techniques were still a bit above the majority.  The big question was how to roll out software development to everyone...

This is part of a series of posts on Why Software Development is the next big Wave and is preceded by Wave 1: The Personal Developer and followed by Wave 3: the Enterprise Developer and Wave 4: Back to the package

Software Development Wave 1: The Personal Developer

This is the wave we are in at the moment and its the wave that we last saw in the late 90s, this is where technologies enabled single people to build small specific things really quickly.  Java and its applets really were the peak of this first wave back then but now we are seeing people use technologies such as R, Python and others to create small solutions that offer really good point value.

Right now I see lots of this bubbling on the edges, just as I did in the late 90s.  The core of enterprises back then was about the SAP and other ERP projects, my work was in 'true' customer development (there isn't a package for Air Traffic Control) but as I moved into the enterprise space I saw a real shift away from rigour in software development towards more personal solutions where maintenance was considered less of an issue.  The Perl script that did something useful but only Dave could change, the C program that worked... but no-one had the source anymore.  The stored procedure that did everything you could ever want... as long as you wanted what was required 2 years ago.

The challenge with the personal developer was maintenance and evolution, people coded for themselves to support and for them to create on their own or in very small teams.  What we've seen already is that people are moving up the stack and using technologies like Hive and HAWQ to give older school interfaces on this data as that means more developers can work together... which brings its own challenges.

This is part of a series of posts on Why Software Development is the next big Wave and is followed by Wave 2: The team DeveloperWave 3: the Enterprise Developer and Wave 4: Back to the package

Musketeers Day - All Four One and One Four All

Okay in the spirit of brotherly love, helping people out and of course International Talk Like a Pirate Day I think we should declare April the 1st 2014 as Musketeers Day.  Why? Well for the vast Gregorian Calendar (non-US) part of the world the date will be 4/1/14 or All "four one and one four" All.

In honour of that day, and as it falls on April Fool's day to boot I declare the following:

  1. All people should declare fealty to the French King
  2. Each office should have an appointed D'Artagnan who must do most of the work while others take the credit
  3. Wearing of swords is mandatory
  4. As are floppy hats
Musketeers Day will not happen for another 100 years!

Monday, February 17, 2014

How British Airways failed to use the information they have

Having worked with companies a lot in the past few years on how to create a better customer experience and in-particular through MDM to help effectively identify the customer I know just what is possible even in very challenging environments.  The Airline industry is not one of those areas but today with British Airways I received another example of how a company might have all the information required to deliver a good customer service but chooses not to provide that information to where it counts.

I was flying into Heathrow, on a transatlantic flight I booked recently, I'd originally planned to head into London and from then on to Paris via train but checking the prices it turned out flying was cheaper and as I was already at Heathrow it seemed the smart and good corporate citizen thing to do, so I made another booking.

So lets see what BA knew about me at this stage....

  • All of my passport information - the goldest of gold standards
  • My frequent flier, with shiny gold card indicating I need to get out less
  • All my flights with them in the last 10+ years
  • How many times I've missed a flight with them due to my own fault (zero)
  • How many times weather or their issues have caused me to miss a flight (was 3, now 4)

So in the text book of Consumer MDM they have the perfect set of source data to identify the individual (the passport) and a unique number the customer wants to give out (frequent flier ID).  It really couldn't be easier in a world where the ticket (which has the ID) and the gate (which scans the passport) to identify the individual.

They also had two bookings.
One that flew out on a Sunday and was due to land at 2pm on Monday, and a return leg for Wednesday (seriously I need to get out less. 
One that flew from Heathrow to Paris at 15:15 on Monday and returned on Wednesday
 Shared across these bookings are my details, indeed when my iPhone downloaded the two boarding passes it automatically paired them together as a single journey.

So what did BA do?  Well the flight into Heathrow was delayed, but I still had 45 minutes to make the connection.  There were a couple of others on the flight racing for the same flight... I reached the connections check-point first to be told:
You've been taken off the flight
The couple behind me (same inbound, same connection) were let through... this irritated me somewhat but it was only when I got to the rebooking desk that irritation turned into disbelief and outright annoyance.
BA: You missed the conformance check. 
Me: I was on the flight from Phoenix, you knew that
BA: We can't see that its on a separate booking
Me: With your airline, made with your website
BA: Yes but its a separate booking
Lets be clear, I like BA, I like the service from the individuals but this is a great example of how a company fails to leverage the information it has to deliver a decent customer service.  Its not giving its people the right information, information it has, to make the right decisions.

My iPhone automagically recognised that a flight taking off from an airport less than 2 hours after I landed from a previous one was part of a single journey, independent of how many booking IDs there were.  BAs systems have this information, I can see it in my Exec club profile, but at the front line they see only the booking.

Now the smart thing would be for BA to have a very simple check on bookings that says 'if bookings land and take off from the same airport within a set period, say 4 hours, we should consider them as one booking so our airport staff can see what is going on'.  At the very least the people at the airports should have access to all my flight bookings so before they kick a customer, and one they've marked out as a priority customer, off a flight the person can check 'wonder if there is an inbound they are on that we know about?'.

This is truly a great example of a company undermining its excellent customer service by not providing its staff with the information to deliver it.  This isn't about a lack of information, its not even about a lack of good quality information, its about the inability to integrate that information into the business processes where it is needed.

Disclaimer: I actually did a bunch of work for BA several years ago around BA.com and other customer facing parts of the business.  

Thursday, February 06, 2014

NoSQL? No Thanks

There continues to be a disproportionate amount of hype around 'NoSQL' data stores.  By disproportionate I mean 'completely and utterly out of scale with the actual problems of the vast majority of companies'.  I wrote before about 'how NoSQL became more SQL'.  The point I made there is now more apparent the more I work with companies on Big Data challenges.

There are three worlds of data interaction developing

  1. Traditional Reporting - its SQL, deal with it
  2. Complex Analytics - its about the tools and languages, R, SAS, MADLib, etc
  3. Embedding in applications
The point here is that getting all those reports, and more importantly all those people who write reports, re-written using a NoSQL approach makes no sense.  Sure Statistical languages and tools aren't SQL, but is it right to claim they are NoSQL approaches?  I'd argue not.  The use of a NoSQL database such as Hadoop or MongoDB is about the infrastructure behind it, its hidden from the users so while it make good technical sense to use such a data store it really doesn't change the way the users are working.

The point in these two areas is that its about the tools that people use to interact with information and supporting the languages they use to interact with that information.  The infrastructural question is simply one of abstraction and efficiency.  Like caring about whether your laptop is connecting over 802.11g or 802.11n, yes I know you care but that is because you are a techy.  The person using their iPad doesn't care as long as the videos stream from YouTube successfully.  Its the end user experience that counts not the infrastructure.

The final case is the world of developers, and here is another shock: business users couldn't care less what developers use as long as they deliver. If you can deliver better using SQL then use that, if you use NoSQL then use that, if you can deliver better by using a random number generator and the force then go for it.  Again however the business doesn't care if you use NoSQL or not and nor should they. What they care about is that it works, meets the business requirements and non-functionals and can be changed when they need it to.

Stop trying to force a technical approach onto the business, start hiding your technical infrastructure while giving them the tools and languages that they want.

Friday, January 31, 2014

Java and Analytics the next frontier

I've been pretty verbal about Java going down the wrong path and my view that what Java should do is start having a 'core' which is just the real basics of the VM and the language and then a few profiles which specify what needs to be loaded, with the rest coming in on-demand based on the requirements of a given project.  The old 'it needs to have everything so the browser/desktop/etc' is just rubbish these days with mobility, HTML 5 and generally lots of other ways to get that work done.

There is one area that Java really needs to step up to the mark, that is around Analytics.  I remember when Hibernate became the Java defacto standard for database access.  Brilliant thing it was to.

Now however plain of data access is just one of the problems.  Languages like R, technologies like MadLib are beginning to help move way beyond database access and into complex analytics.  Real-time analytics is another area.  Almost all of these approaches rely on Java in some way but its hard to see how the current approach of JavaSE 8 is actually supporting this new generation of challenges.

Java has a real opportunity to leverage its massive install base to offer real new opportunities through the growth of analytics.

Only if its leadership can start leading.

Wednesday, January 29, 2014

There is no Big Data without Fast Data

Which came first Big Data or Fast Data?  If you go from a hype perspective you'd be thinking Hadoop and Big Data are the first with in-memory and fast coming after it.  The reality though is the other way around and comes from a simple question:
Where do you think all that Big Data came from?
When you look around at the massive Big Data sources out there, Facebook, Twitter, sensor data, clickstream analysis etc they don't create the data is massive systolic thumps.  They instead create the data in lots and lots of little bits, a constant stream of information.  In other words its fast data that creates big data.  The point is that historically with these fast data sources there was only one way to go: do it in real-time and just drop most of the data.   So take a RADAR for instance, it has a constant stream of analogue information streaming in (unstructured data) which is then processed, in real-time, and converted into plots and tracks.  These are then passed on to a RADAR display which shows them.

At no stage is this data 'at rest' its a continue stream of information with the processing being done in real-time.  Very complex analytics in fact being done in real time.

So why so much more hype around Big Data?  Well I've got a theory.  Its the sort of theory that explains why Oracle has Times Ten (an in-memory database type of thing) and Coherence (an in-memory database type of thing) and talks about them in two very different ways.  On the middleware side its Coherence and it talks about distributed fast access to information and processing those events and making decisions.  Times Ten sits in the Data camp so its about really fast analytics... what you did on disk before you now do in memory.

The point is that these two worlds are collapsing and the historical difference between a middleware centric in-memory 'fast' processing solution and a new generation in-memory analytical solution are going to go away.  This means that data guys have to get used to really fast.  What do I mean by that?

Well I used to work in 'real real-time' which doesn't always mean 'super fast' it just means 'within a defined time ... but that defined time is normally pretty damned fast'.  I've also worked in what people consider these days in standard business to be real-time - sub-micro second response times.  But that isn't the same for data folks, sometimes real-time means 'in an hour', 'in 15 minutes', 'in 30 seconds' but rarely does it mean 'before the transaction completes'.

Fast Data is what gives us Big Data, and in the end its going to be the ability to handle both the new statistical analytics of Big with the real-time adaptation of Fast that will differentiate businesses.  This presents a new challenge to us in IT as it means we need to break down the barriers between the data guys and middleware guys and we need new approaches to architecture that do not force a separation between the analytical 'Big' and the reactional 'fast' worlds.

Tuesday, January 28, 2014

EDW in the Library with Single Canonical Form - get a clue about killing the business

The game Cluedo (or just plain Clue in North America) is about discovering which person committed the murder, in what room using what.  What is amazing is that in IT we have the easiest game of Cluedo going and yet over and over again we murder the poor unfortunate business in the same way, then stand back and gasp 'I didn't know that would kill them'.

I talk about the EDW, the IT departments hammer to which every question of 'I don't have the information I need' looks like a nail.  The EDW is the murderer of information agility, the constrainer of local requirements and the heavy weight bully of the data landscape.  But its weapon of choice is more blunt than the lead pipe - the Single Canonical Form.  The creation of which requires compromise, limitation and above all a bloody indifference to the actual local needs of business users.

An EDW is normally actually only trying to answer a question at a high level of corporate consistency, so financial roll-up, a bit of a horizontal view around customer and maybe some views around procurement... although the latter is normally better done on its own.  The point is that it really isn't Enterprise beyond the fact that Enterprise is a lie.  It can be really good at doing that top level view, of creating a corporate data mart but the effort that it requires to do so often stifles the agility in local business units and chokes the throat of local information initiatives.

The good news is that IT didn't do this for completely bloody minded reasons, it did it because IT had constraints, data storage costs first amongst them and IT had a hard wall between the world of the operational transaction and the world of post-transactional analytics.  So the EDW worked in that limited space and with those restrictions.

The challenge now is that the restrictions of gone, storage  costs are now amazingly low when looking at Hadoop and the wall between operations and analytics has gone, with operations being the primary place that analytics is new able to deliver insight at the point of action.  This was the thinking that I put into the Business Data Lake an approach that matches the business environment, leverages the thinking behind Business SOA applied to data.

So lets put down the EDW, lets walk away from the single canonical form and get a clue.  Its going to take time for IT departments, as well as analysts, vendors and consultants, to be weaned off the EDW drug but I firmly feel that in 5 years time we will be looking at a world where the IT department no longer says:

"You need an EDW, lets design the schema, should be ready early next year"

and instead says

"Sure, I'll knock up a solution in the BDL for you, be ready on Tuesday"

The customer is always right, and our customer in IT is telling us that its the local view that counts... so can we stop battering them with a global view that doesn't fit their local problem.