Tuesday, March 11, 2014

What are the types of Data Scientist?

There are various views going around on what a Data Scientist is and what their value is to an organisation and the salaries they command.  To me however asking 'what is a Data Scientist?' is like asking 'What is a Physicist?' sure 'someone who studies Physics' might be a factually accurate but pointless definition.  How does that separate someone who did Physics in High School from Albert Einstein?  How does it separate the folks at CERN from someone using implicit Newtonian mechanics to play pool?   So it is with Data Science, but with an added twist.
Data Science is spectacularly badly defined
So yes you have courses cropping up at universities claiming to teach Data Science, you have consultants who have some mildly fancy Excel spreadsheets claiming they are Data Scientists.  In my career I've had the pleasure of working with some real Data Scientists, quite a lot of the time they didn't call themselves that but its what they were.  They used Data and applied some really fancy maths to deliver insight that just couldn't be attained with out it.  So I'll pick up the challenge laid down by Giga on whether Data Science is real or not and say 'yes.... but' here are my four groups of Data Scientists that I've worked with.

Data Magicians or Professor Data Science
Arthur C. Clarke once said that any sufficiently advanced technology is indistinguishable from magic.  This is how I feel when I work with people in this group.  They normally have mathematical or physics centric PhDs (often several), often focused in specific areas such as fluid dynamics, economics or super specific such as wind-turbines.  Why is what they do Magic?  Because these are the folks who work on 'next' and it is not a big group but these are the CERN folks of Data Science.  The reason its science is because its testable and provable.  They can show that their algorithm would have produced 5% improvement in performance over the past 5 years, and as it moves forward show how their approach has made a difference to the performance of a business.

How to know they are doing real Data Science?  Well the first hint is the mathematics, its the stuff where you remember the symbols but the combination of them all together now looks like gibberish, and yet these folks are arguing over specific parts of the formula as to how it can be improved.  Its like watching developers arguing over the right way to handle machine to machine communication.  You know who is smart by the outcome and the focus on the specific not the general.  Being blunt however very few companies need these folks and if they do they need very few.  Working with external organisations who have a good eco-system or Data Science structure is going to be better than having a lone Data Magician wandering around getting bored.    New algorithm development is not a regular thing.

Data Operators or Resident Data Scientist
The next group is what lots of companies will see value in.  These aren't the Magicians or Professors but they are crucial to making Data Science have value.  These are the people who take predefined algorithms, statistical or machine learning, and then apply them to a specific company scenario and most crucially keep the parameters up to date so the algorithm continues to perform.  These are the operational side of Data Science, the people for whom its a regular day job.  They can't do the design of a new algorithm but they can deliver specific value with an existing one which is what really counts for a business.  These folks are adapt at choosing the right approach and choosing between algorithms to choose the right way to deliver the most value.

These are the people most companies need to be thinking about, people who can take libraries like MADlib, languages like R or tools like SAS and then apply it to your local challenge, deliver the value and provide the on-going support to keep it effective.  Companies need access to these people either internally or as part of an external service, but in a more Outsourcing/regular way than with the project driven Magicians.  These people will have a formal mathematical or physics background, often up to the PhD level, but their abilities are more focused to application than invention.  These guys understand the formulas however and that is how you know they are real.

Data Hackers or Odd Job Data Scientist
The next group are people with a bit of skill, maybe a bit of training, but they aren't at the level of sophistication of an operator and are miles away from being a Magician.  Sadly these people often don't know this.  These are folks who've take the tools, learnt a bit about how they work and most often have fixated on a specific way of solving a specific problem with specific tools.  These folks are out there today and aiming to get work by being 'the one eyed man in the kingdom of the blind'. These folks can cost only $30 an hour apparently.  These folks add limited value but can improve the current situation in the same way as a decent report writer could.  In fact there is much overlap between the report writer and these people.  That doesn't mean they have zero value but that you need to know what they are doing.

Using these folks under the guidance of a Resident Data Scientist can add value, but don't mistake knowing how to apply one Machine Learning technique for actual knowledge.  The folks in this group know how the tool works but don't understand the mathematics behind it.

Data Science Bluffers or MS Office Data Scientists
This is the last group, and I'd say the one most responsible for the idea that Data Science might not be a real thing.  These are people who put a 'predict future stock price movement' box on their diagrams and mutter 'its Data Science'.  They are the people who get a spreadsheet with a bunch of data, apply a very basic statistical function and claim 'Hey its Data Science'.  These are the bluffers of Data Science the people who are trying to play the one eyed man and hoping that no-one notices they have a blindfold on.

This is a big group right now, populated in a large part by consultancies often those that specialise in Excel spreadsheet type of work and now claiming that this is the Data Science insight that the company needs.  These guys understand neither Data Science tools, nor the mathematics behind them but they do know how to create a good deck and Excel sheet.


Still doubt its real?
Still wondering whether this sort of thing is real? Well I'll give you a problem to solve - Medium Term Conflict Analysis in Air Traffic Management that is a 'hard' problem and one that requires really complex Mathematics to understand all of the pieces at play in the sky to achieve effectively.   Here is another one, you have 50 suppliers, 500 stores, 20 distribution centres, 100,000 SKUs and information streaming in about sales, social insight and stock levels... how do you efficiently procure, ship and stock to maximise profits?  Don't forget to include cannibalisation between brands and the costs of wasted stock or stock-room space.

There are lots of other challenges where Data Science adds value but the point is that you need to understand what sort of Data Science you are trying to achieve.  For most people this isn't getting Data Magicians, its about getting Data Operations folks, the Resident Data Scientists.

So its real, but don't forget to challenge people who say they are a Data Scientist and work out which bucket they fall into

  

No comments: