Friday, January 09, 2015

Securing Big Data - Part 4 - Not crying Wolf.

In the first three parts of this I talked about how Securing Big Data is about layers, and then about how you need to use the power of Big Data to secure Big Data, then how maths and machine learning helps to identify what is reasonable and was is anomalous.

The Target Credit Card hack highlights this problem.  Alerts were made, lights did flash.  The problem was that so many lights flashed and so many alarms normally went off that people didn't know how to separate the important from the noise.  This is where many complex analytics approaches have historically failed: they've not shown people what to do.

If you want a great example of IT's normal approach to this problem then the ethernet port is a good example.
What does the colour yellow mean normally? Its a warning colour, so something that flashes yellow would be bad right?  Nope it just means that a packet has been detected... err but doesn't the green light already mean that its connected?  Well yes but that isn't the point, if you are looking at a specific problem then the yellow NOT flashing is really an issue... so yellow flashing is good, yellow NOT flashing is bad...

Doesn't really make sense does it?  Its not a natural way to alert.  There are good technical reasons to do it that way (its easier technically) but that doesn't actually help people.

With security this problem becomes amplified and is often made worse through centralising reactions to a security team which knows security but doesn't know the business context.  The challenge therefore is to categorize the type of issue and have different mechanisms for each one.  Broadly these risks split into 4 groups
Its important when looking at risks around Big Data to understand what group a risk falls into which then indicates the right way to alert.  Its also important to recognize that as information becomes available an incident may escalate between groups.

So lets take an example.  A router indicates that its receiving strange external traffic.  This is an IT operations problem and it needs to be handled by the group in IT ops which deals with router traffic.  Then the Big Data security detection algorithms link that router issue to the access of sales information from the CRM system.  This escalates the problem to the LoB level, its now a business challenge and the question becomes a business decision on how to cut or limit access.  The Sales Director may choose to cut all access to the CRM system rather than risk losing the information, or may consider it to be a minor business risk when lined up against closing the current quarter.  The point is that the information is presented in a business context, highlighting the information at risk so a business decision can be taken.

Now lets suppose that the Big Data algorithms link the router traffic to a broader set of attacks on the internal network, a snooping hack, this is where the Chief Information Security Officer comes in, that person needs to decide how to handle this broad ranging IT attack, do they shut down the routers and cut the company off from the world?  Do they start dropping and patching, and do they alert law enforcement.

Finally the Big Data algorithms find that credit card data is at risk, suddenly this becomes a corporate reputation risk issue and needs to go to the Chief Risk Officer (or the CFO if they have that role) to take the pretty dramatic decisions that need to be made when a major cyber attack is underway.

The point here though is that it needs to be systematic how its highlighted and escalated, it can't all go through a central team.  The CRO needs to be automatically informed when the risk is sufficient, but only be informed then.  If its a significant IT risk then its the job of the CISO to inform the CRO, not for every single risk to be highlighted to the CRO as if they need to deal with them.

The basic rule is simple: "Does the person seeing this alert care about this issue? Does the person seeing this alert have the authority to do something about this issue? and finally: does the person seeing this alert have someone lower in their reporting chain who answers 'yes' to those questions?"

If you answer "Yes, Yes, No" then you've found the right level and then need to concentrate on the mechanism.  If its "Yes, Yes, Yes" then you are in fact cluttering if you show them everything that every person in their reporting tree handles as part of their job.

In terms of the mechanism its important to think on that "flashing yellow light" on the Ethernet port.  If something is ok then "Green is good", if its an administrative issue (patch level on a router) then it needs to be flagged into the tasks to be done.  If its an active and live issue it needs to come front and center.

In terms of your effort when securing Big Data you should be putting more effort into how you react than on almost any other stage in the chain.  If you get the last part wrong then you lose all the value of the former stages.  This means you need to look at how people work, look at what mechanisms they use, so should the CRO be alerted via a website they have to go to or via an SMS to the mobile they carry around all the time and that take them to a mobile application on that same device? (hint: its not the former).

This is the area where I see the least effort made and the most mistakes being made, mistakes that are normally "Crying Wolf" so you show every single thing and expect people to filter out thousands of minor issues and magically find the things that matter.

Target showed that this doesn't work.



No comments: