Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Thursday, September 17, 2020

Service Accounts suck - why data futures require end to end authentication.

 Can we all agree that "service" accounts suck from a security perspective.  Those are the accounts that you set up so what system/service can talk to another one.  Often this will be a database connection so the application uses one account (and thus one connection pool) to access the database.  These service accounts are sometimes unique to a service or application, but often its a standard service account for anything that needs to connect to a system.

The problem with that is that you've therefore got security defined at the service account level, not at all based on the users actually using it.  So if that database contains the personal information of every customer then you are relying on the application to ensure that they only display the information for a given customer, the security isn't with the data its with the application.

Back in 2003 a group called the "Jericho Forum" set up under Open Group to look at the infrastructural challenges of de-perimeterisation and they created a set of commandments the first of which is:
The scope and level of protection should be specific and appropriate to the asset at risk. 

Service accounts break this commandment as they take the most valuable asset (the data) and effectively remove security scope and place it in the application.   What needs to happen is that the original requestor of the information is authenticated at all levels, like with OAuth, so if I'm only allowed to see my data then if someone makes an error in the application code, or I run a Bobby Drop Tables attack, my "Select *" only returns my records.

This changes a lot of things, connection pooling for starters, but when you are looking at reporting in particular we have to get away from technologies that force systems accounts and therefore require multiple security models to be implemented within the consumption layer.

The appropriate level to protect data is at the data level and the scope is with the data only by shifting our perception of data from being about service accounts and databases to being about data being the asset can we start building security models that actually secure data as an asset.

Today most data technologies assume service accounts, this means that most data technologies don't think that data is an asset.  This has to change.

Thursday, May 22, 2014

Lipstick on the iceberg - why the local view matters for IT evolution

There is a massive amount of IT hype that is focused on what people see, its about the agile delivery of interfaces, about reporting, visualisation and interactional models.  If you could weight hype then it is quite clear that 95% of all IT is about this area.  Its why we need development teams working hand-in-hand with the business, its why animations and visualisation are massively important.

But here is the thing.  SAP, IBM and Oracle have huge businesses built around the opposite of that, around large transactional elements, things that sit at the backend and actually do the running of the business.  Is procurement something that needs the fancy UI?  I've written before about why procurement is meant to be hated so no that isn't an area where the hype matters.

What about running a power grid? Controlling an aeroplane?  Traffic management? Sure these things have some level of user interaction and often its important that its slick and effective.  But what percentage of the effort is about the user interface?  Less than 5%.  The statistics out there will show that over 80% of spend is on legacy and even the new spend is mainly on transactional elements.

This is where taking a Business SOA view can help, it starts putting boundaries and value around those legacy areas to help you build new more dynamic solutions.  But here is a bit of the dirty secret.

The business doesn't care that its a mess behind the scenes.... if you make it look pretty

Its a fact that people in IT appear regularly shocked at.  But again this is about the SOA Christmas, the business users care about what they interact with, about their view for their purposes. They don't care if its a mess for IT as long as you can deliver that view.

So in other words the hype has got it right, by putting Lipstick on the Iceberg and by hyping the Lipstick you are able to justify the wrapping and evolution of everything else.  Applying SOA approaches to Data is part of the way to enable that evolution and start delivering the local view.

The business doesn't care about the iceberg... as long as you make it look pretty for them. 

Wednesday, March 12, 2014

What is real-time? Depends on who you ask

"Real-time" its a word that gets thrown about a lot in IT and its worth documenting a few of the different ways it gets used

Hard Real-time
This is what Real-time Java was created to address (along with Soft Real-time) what is this?  Easiest way to say it is that often in Hard Real-time environments the following statement is true
If it doesn't finish in X milliseconds then people might die
So if you miss a deadline its a systems failure.  Deadlines don't have to be super small, they could be '120 seconds' but the point is that you cannot miss them.

Soft Real-time
This was another use case for RTJ, here there are deadlines but missing deadlines isn't fatal but does degrade the value of the result and results in degraded performance.  Again though we are talking about deadlines not performance
If it doesn't finish in X milliseconds the system risks failing and will be performing sub-optimally
Machine real-time
This is when two machines are communicating on a task or processes within a machine, here the answer is 'as fast as is possible' there aren't any deadlines but the times are always measured below the microsecond level.  These are calculations and communications that get done millions and billions of times so a shift from 0.1ms to 1ms over a billion attempts means that the end-to-end work takes just over a day against 11 days.  This is the world of fast and HPC where the communications and processes need to be slimmed for speed.
Every microsecond counts, slim-it, trim-it because we're going to do it a billion times
Transactional real-time
Transactional real-time is about what it says, the time to complete a transaction, something that hits the database and returns.  Here we are in the millisecond to a tenth of a second type of range,  its this number that determines how internally responsive an application is.  This is the end to end time from initiation to response and at that point the state of the system has been changed.
Don't make me wait for you
User transactional real-time
User transaction real-time is what looks fast to a user of a system, this varies from interactional systems where it means sub-second to internet solutions and web-sites where it might mean 5 seconds or so.  Again this is the end-to-end time and includes something actually having happened and being able to check that it has happened.
Be fast enough so the user thinks its magic
BI reporting real-time
Next up are the pieces that are the views on the recent reality the time it takes for a report to be generated from 'landed data'.  Here BI guys tend to think of real-time in the same way as User Transactional real-time, 5 or so seconds is therefore acceptable.  This isn't however an end-to-end time as the data is already landed and all that is happening is the reports being done quickly.  Crucially however this is about reporting, its not about having transactional systems hitting the reporting system.
Let the user do all the reports and variations they want, we can handle it and not annoy them
BI real-time
The next definition is for the end-to-end of BI, so the extracting of data from a source system, the loading and transformation of that data and finally a report being done which includes that new information.  Here for BI the real-time definition can get longer and longer.  I've had clients say that 5 minutes, 15 minutes or even 2 hours are considered acceptable 'real-time' responses.  The point here is that when the current response time is 'next day' then something that is only minutes or even a few hours delayed is a significant increase in performance and is considered responsive.
Tell me shortly what went wrong right now
Chuck Norris Real-Time
In Chuck Norris real-time its happened before you even knew that you wanted it to happen.
Problem solved 

Tuesday, June 28, 2011

Social Relationships don't count until they count

There is a game called "the Six Degrees of Kevin Bacon" which tries to link between any Kevin Bacon and any other actor in less than six steps.  This is a popular version of the "small world" thesis put forwards by Stanley Milgram.  In these days of Social Media and "relationships" there is a massive hype around farming these relationships with an implicit assumption that someone with lots of relationships is more valuable than someone who doesn't

The problem is that in reality this is all a version of the Travelling Salesman problem with everyone assuming that every link is of the same value.  The reality is that links have different values based on their strengths so understanding how individuals are actually related is significantly more complex than many social media "experts" would have you believe.

What do I mean by this?  Well my "Obama Number" is 4 as, via my wife, I can trace to Obama in 4 steps with each individual step being reasonably strong.  By reasonably strong I mean that each link has met the previous link several times and probably could put a name to the face.  Now the variability of strengths on these links is huge, from my wife (hopefully a strong link) to people who move in similar social circles and then into the political sphere where the connection to Obama is made.

I've a Myra Hindley number of 2 as I have a friend who met her more than once (before her conviction).

So for Republicans and Tea Party nut-jobs this means that its 6 steps max from Obama to a child killer.  Does this mean there is a relationship worth knowing or caring about?  Nope.

So how to weight relationships and how to weight each step within the graph?  Well this is actually pretty simple.  Lets say A has a relationship to B via a social network, lets call that a score of 0.0001.  Lets say that B (who is the person) has a score of 1.0.  So for each interaction between two individuals you then look at the strength from A to B.

  1. How many times does A post to B?  If  > 10 then add 0.0001
  2. How many times does B post to A?  If > 10 then add 0.001 (i.e. B connects to A, hence more likely to be mutual) for each multiple of 10
  3. How many times does B indicate that they are at the same place as A? If > 10 then add 0.001 per 10
  4. How many times does a voucher provided to A get used by B? If  > 10 then add 0.1 per 10
  5. Are they directly related or married? If cousin or less then add 0.5
  6. Do they work closely together? If within 1 reporting hop add 0.2
  7. How many times have they met? If > 10 then add 0.05 per 10
What I'm saying is that its actually the interactions that matter to back up the social experience rather than the existence of a social link.

So while from Obama to me is 4 steps I'd say that overall its pretty weak (0.8 * 0.2 * 0.2 * 0.2 =  0.0064) a .64% link which really means I'm not worth lobbying to get influence over the US president.

This is where the combination of Big Data analytics could really deliver value, by understand the true weightings on individual relationships and from that determining the real genuine paths to the maximum possible market for the minimum effort.



Technorati Tags: ,

Monday, January 31, 2011

Data Services are bogus, Information services are real

One of the questions that used to be asked, or proposed as fact, in old school SOA was the idea of "Data Services" these were effectively CRUD wrappers on Database tables and the idea was that they were reusable across the enterprise. I've said many times this is a dumb idea as the importance is actually about the information, which means context and governance.

Now the other day when I was talking about MDM a bright spark pointed out that I hated data services but wasn't MDM just about data services?

Its a good challenge, mainly because that is the problem about how many people view MDM. MDM is, when done well is about the M and the M not the D, i.e. its more about Mastery and Management than it is simply about Data. What does that mean?

Well lets take everybody's favourite MDM example "Customer" a Data driven approach would give us a Service of
Service Customer
  • Capability: Create
  • Capability: Update
  • Capability: Delete
  • Capability: Read

Now this is the "D" approach to MDM and SOA, also known as the Dunce's approach, its about Data Services and viewing the world as a set of data objects.

The smart approach is to view MDM as an information challenge and delivering information services so instead of the data centric approach we get
Service Customer
  • Capability: Establish Prospect
  • Capability: Establish Customer
  • Capability: Modify
  • Capability: Change Status
  • Capability: Find
  • Capability: Archive Customer
  • Capability: Validate Customer
  • Capability: Merge Customers
  • Capability: Split Customer

Here we start exposing the customer service and make clear two things
  1. We are talking about the customer in context
  2. We reserve the right to say "no" to a request

So this is where customer genuinely can be used from a single service across the enterprise. This service takes on the responsibility to authorise the customer, or at least ensure that authorisation is done, it sets the quality standards and governance standards and doesn't allow people to do what ever they want with it. It includes the MDM processes elements around customer management and provides a standardised way of managing a customer through its lifecycle.

This is fundamentally the difference between a business SOA driven approach which concentrates on the services in their business context and with their business governance and a technical driven approach which looks to expose technical elements as via a technical wrapper.

Technorati Tags: ,

Friday, June 25, 2010

Location Centric packages - don't buy a 1990s package

One of the big problems with package solutions is that they are very database centric. Changing the data model is basically suicide for a programme. Adding a few new tables is dodgy but sometimes required and adding columns is reasonably okay but modifying the core data model is always going to get you in hot water.

One area that I've seen consistently as a problem over the years though is down to how the package vendors have thought about physical and electronic addresses. When the packages were created there were really only one set of important addresses, physical addresses. Phone numbers were pretty much fixed to those premises and email was a million miles away from the mind. This means that the data models tend to look at electronic addresses as very much second class citizens, normally as some form of child table rather than as a core entity.

The trouble is that as packages are being updated I'm seeing this same mistake being made again with some of the new technology models being used by vendors (AIA from Oracle appears to make the mistake). The reality is that the model is pretty simple

That really is it. There are two key points here
  1. Treat all actors as a single root type (Party) then hang other types off that one
  2. Do the same for Locations
The reason for doing this is pretty obvious. These days mobile phone numbers and email addresses are much better communication tools than physical addresses. As you want to send e-statements, e-invoices, and other elements to customers like SMS delivery notices, then you want to be able to channel shift customers much more simply. If a customer switches their delivery address for a book to an email then that is fine as long as you can ship them an e-book.

Now I know that anyone from an OO background is going to go "well duh!" but it does amaze me how in package land the database centric mindset still dominates and people just don't seem to want to revisit the assumptions they made in the 1990s when their hacks to put in electronic addresses seemed like a safe bet, after all email and the internet weren't considered as future strategies.

Its now well into the 21st century and I'd really advise people buying packages to look long and hard at the data model and ask yourself "is this a 1990s view of business or a 21st Century view" if its the former then be aware that you will have pain.

Technorati Tags: ,

Tuesday, October 20, 2009

Data Accuracy isn't always important

Now while Amex really should have understood the term minimum there are examples where it really isn't an issue if someone gets it wrong in displaying the information to you. Sometimes this indicates that a prediction has been incorrect or that "approximately" is good enough for this scenario.
Is a good example of this, the current temperature on the Sydney Morning Herald site is listed as one degree higher than the maximum for the day. Does this matter? Well no and for two reasons. Firstly a weather forecast is accepted as being approximate information, its a chaotic system and so by definition can't be predicted exactly. Secondly the Max number is only a prediction and the current temperature is indicating that it was an incorrect prediction. So by having an incorrect piece of information we actually have more information as it re-enforces the concept that weather forecasts cannot be 100% accurate.
Now when the next day rolls around then looking back you should clearly be recording the actual maximum achieved rather than the prediction. This is because the information has gone from being a record of a prediction into a record of fact. The only question therefore is at what point you should update the maximum. Do you change it dynamically or on a daily basis when reporting historical information. For the Sydney Morning Herald site the answer is simple changing the daily maximum as it increases during the day would defeat the purpose of the "max" level which is what the paper predicted it would be at the start of the day. Its a free news story if it goes well beyond the prediction "Sydney Weather was bonza today with max temperatures 5 degrees higher than expected".

So the point is that when you look at data do think about what level of accuracy is important. If reporting a bank balance then spot on is the only option, if reporting the number of customers who bought cheese with wine as a percentage of overall cheese buyers then you can probably get away with 1 decimal place or less. This sort of view applies even more when looking at forecasting and other predictive data sets, the effort of increasing accuracy by 1% might be pointless due to the extra time it takes.

Data Accuracy isn't an absolute, be clear about what matters to you.


Technorati Tags: ,

Thursday, September 25, 2008

DOA your SOA?

I like a lot of the stuff that Dave Linthicum writes but I don't agree with his latest post on why people should start with the data when doing SOA.

First off lets be clear, data is important its one of a few really important things in SOA
  1. Services
  2. Capabilities
  3. Real World Effects
  4. Data
Now I've deliberately but them in that order because that is the order I think is important. I've said before about SOA v POA and how I think you know that you are doing SOA. The key point on this is that a stack based view of
But the fact is you need to start with the data first, than work up to services, than the agile layer (process, orchestration, or composite). If you follow those basic steps you'll find that the solution is much easier to drive.
just doesn't work for me. It first of all implies a technology centric view of SOA that I don't agree with and secondly places services as just an intermediary between process and data, with the goal surely being to do more and more in the "agile" layer (although those of us who have done large BPM and composite projects probably aren't sure about the agile bit). This stack based view isn't going to progress IT, it might help with a single project but surely the goal of SOA is more than that?

To me this plays into the technology centric change view of much SOA advice out there. To me data is important but it is only important where it works this means understanding the services first.

Now if Dave is talking about having done the business services how he'd look to realise them with technology and that he'd have a single business governance piece over the service dealing with the data, process and other elements then I'm with him. But starting with data for your architecture.

To me that means you have a Data Oriented Architecture, and I can't see anyone selling to the business that the next project is going to be DOA and the goal is the whole IT estate to be DOA within 5 years.

So start with the business services, understand the business capabilities, understand the real world effects and then understand what data is being exchanged and managed to deliver that. Data is important, very important, but if you are taking a business view of modelling SOA then its just one of the technical artefacts, not the major purpose for existence.



Technorati Tags: ,

Saturday, April 05, 2008

Information Oblivion - data only counts where it works

Reading a few email lists recently I've noticed a worrying trend around SOA that can be compared the the "struct" problem of OO. People are trying to look at information independently from its business context. Often this is the single canonical form mentality but its part of a broader problem where people (often called information architects) push forwards an idea that the information is the important bit and that all systems do is move information around and update the information.

A lot of these efforts try and treat the data as an independent entity and aim to create data sources which will act independently and therefore be able to be "used" across the enterprise.

The problem with this is that it works okay for after the fact data, so having a single Customer data source could work, if its just about the basic customer information. Having a historical record of orders is also okay. The point about these bits of data is that they are about recording what has happened. Where this approach falls down is when you try and apply that approach to what is happening. The problem here is that this temporal information only makes sense in the context of the current execution.

Disconnecting data from the business service that manipulates it just means that you have to put the interpretation logic (the bit in the service that understands what the fields mean at different stages) needs to either shift into the data service (so its not a data service) or into every single service that uses the data (which is bonkers).

The basic philosophy that has served me well is that data only counts where it is used. After the fact reporting is a data centric element and suits data stores, this is because the use is really about the information and just shifting, its structs, but where the data is related to activities then keep the data close to the action, dump it into the store later but don't do that until you've finished doing what you need with it.

Information is power, but only if you act on it correctly.

Technorati Tags: ,

Tuesday, November 07, 2006

What Geo ripping means to the enterprise

The other reason for Geo ripping wikipedia was to explore what can be done with the unstructured information that is created inside organisations and how easy it would be to
  1. Re-purpose the information
  2. Give credence to quality
  3. Turn human focused information into systems focused information
The first piece that is critical is that the Wikipedia information isn't truly unstructured. So I was really taking templated information out, which meant it was much easier than truly unstructured information. But this is a pretty standard case when you think about information that is stored in Access databases or Excel sheets where templated or semi-structured information is the norm which makes it a reasonable use case to think about how current information in things like Excel et al can be turned into information that can be directly used elsewhere in the enterprise.

So that is stage one, which leads directly to stage two, namely the question of data quality and provenance. If I release information that is manually created into a spreadsheet (but on which critical decisions are currently based) and allow that to be directly integrated elsewhere without the human judgement and oversight, how do consumers know the quality or provenance of the data? How do I state on my Web Service "this service shouldn't be used for anything serious like nuclear power or making actual decisions" without it become the standard shrink wrapped license that all software vendors tag on, and everyone ignores?

The threat here is that Line of Business (LOB) will use this sort of approach to create a web service like "Current Sales Budget" which contains not only out of date information, but information that has incorrect assumptions. This will then be consumed by others who think it is the "real" current sales budget. This is a big risk in businesses especially if used for modelling and the like as small errors in one place can lead to massive errors at the end. Data provenance is going to be a big issue in this world of "easy" to develop Web Services.

The final element is about going the other way from the previous goal of IT which has tended to turn systems information into human focused information. The goal here is to take all of the information created in these collaborative and participative systems and turn it back into something that the enterprise can use, hence the reason I wanted to take a Wiki and put the information into a database.

So my little experiment proved that it can be done, and that its liable to be an issue in terms of data and provenance. Not sure on the solution yet, but at least it give me something to think about.


Technorati Tags: , , , , , ,

Monday, August 28, 2006

Single Canonical form - not for SOA

A very short while ago I made a post about the starting point for SOA and in the follow-up I made a comment that I wasn't a big fan of canonical data models, and I was asked to clarify why, so here goes.

First off lets agree what we mean by this and I'll take this
Therefore, design a Canonical Data Model that is independent from any specific application. Require each application to produce and consume messages in this common format.
As the definition. I've seen, and led, quite a few projects that have used a canonical data model, and for some things its worked and for lots its failed (of course the ones I led were the ones that worked :) ). So its not that canonical forms are a really bad thing, its just that they aren't the ultimate solution.

Taking the manufacturing service Level 0 as out start
and thinking about product and customer as our two data elements to worry about. There are three approaches here, all of which could be called canonical to some degree but which represent very different approaches to the solution.

Just the facts

The first is the one I've used most often to success in this area, and its focus is all about the interactions between multiple services. The rule here is basically common demoninator, the objective is to find the minimum set of data that can be used to effectively communicate between areas on a consistent basis. The goal here isn't that this should be used on 100% of occasions but that it represents 70-80% of the interactions.

In this model we might even get to the stage where its ProductID and CustomerID that are shared and we have a standard provisioning approach for the two to ensure that IDs are unique. But most often its a small subset that enables each service to understand what the other is talking about and then translate it into its own version. So in this model the "canonical" form is very small, really just a minimal reference set. This does mean that sometimes conversations have to take place outside of this minimal reference set, and that is fine, but its more costly so the people making that call need to be aware that now they are completely responsible for managing change of that interaction.

So in this model we might say that all product elements are governed by the productID, customer consists of Name and Address, but when sales talk to finance to bill a customer for an order they also include the product description from their marketing literature to help it make sense on the invoice. Here we would model this extra bit of information either as an extension to the previous data model, or just consider it bespoke for that transaction. This would mean that the sales service team would now be responsible for the evolution of that data description rather than using the global model which would be owned and maintained... well globally. The objective when communicating is to use this minimal reference set as much as possible, as this reduces effort, and the goal of the team that maintains it is to keep it small so its easier for them.

This model is paticularly effective in data exchange projects like reporting or on base transactional elements, but its the one I've seen used most effectively especially when combined with strong governance and enforcement around the minimal reference set. A great advantage of this model is that it reduces the risk of work being done in the wrong place, if all you have is CustomerID you are unlikely to undertake a massive fraud profiling project, something better off left to finance.

When you talk, we listen

The next approach, and already getting into dangerous territory IMO is to create a superset of all interactions between services. In this world the goal is to capture a canonical form that represents 100% of the possible interactions between services. Thus if a service might need 25 fields of product information then the canonical form has those 25 fields. The problem with this model is that there is a lot of crap flying about that is just there for edge cases. It can be made to work but it makes day-to-day operations harder and tends to lead to blurring of boundaries between areas/services and increases the risk of duplicate functionality. Its also a real issue of information overload. What I've tended to see happen in this model is that people start adding fields "incase" to the model and also start consuming and operating on fields "because they can". This isn't sensible.

I'd like my project to fail please

The final approach is the mythical "single canonical form" this beast is the one that knows everything, its like enterprise but even worse. This one creates a single data model that represents not only the superset of interactions, but the superset of internals as well. So it models both how finance and manufacturing view products and lobs them together, considers how sales and distribution view customer and lobs them together. Once this behemouth is created it then mandates this as the interaction between the areas with (in my experience) disasterous consequences. Its too complex, it removes all boundaries and controls and it ends up with ridiculous information exchanges where both parties known how internally the two areas operate. When external parties are brought into the mix it gets ever more complicated and fragile.

Summary

So I'm not 100% against canonical in terms of an intermediary for data exchange, but I do think that a canonical form needs to be kept very small and compact and that exchanges outside of that canonical form must be allowed for and managed. In the same way as there isn't an enterprise service bus that does everything there is equally no single canonical form that works everywhere. Flexibility comes from mandation for certain elements and increased costs for stepping outside, but stepping outside has to be allowed to enable systems, people and organisations to operate effectively.


Technorati Tags: , ,