Tuesday, November 07, 2006

What Geo ripping means to the enterprise

The other reason for Geo ripping wikipedia was to explore what can be done with the unstructured information that is created inside organisations and how easy it would be to
  1. Re-purpose the information
  2. Give credence to quality
  3. Turn human focused information into systems focused information
The first piece that is critical is that the Wikipedia information isn't truly unstructured. So I was really taking templated information out, which meant it was much easier than truly unstructured information. But this is a pretty standard case when you think about information that is stored in Access databases or Excel sheets where templated or semi-structured information is the norm which makes it a reasonable use case to think about how current information in things like Excel et al can be turned into information that can be directly used elsewhere in the enterprise.

So that is stage one, which leads directly to stage two, namely the question of data quality and provenance. If I release information that is manually created into a spreadsheet (but on which critical decisions are currently based) and allow that to be directly integrated elsewhere without the human judgement and oversight, how do consumers know the quality or provenance of the data? How do I state on my Web Service "this service shouldn't be used for anything serious like nuclear power or making actual decisions" without it become the standard shrink wrapped license that all software vendors tag on, and everyone ignores?

The threat here is that Line of Business (LOB) will use this sort of approach to create a web service like "Current Sales Budget" which contains not only out of date information, but information that has incorrect assumptions. This will then be consumed by others who think it is the "real" current sales budget. This is a big risk in businesses especially if used for modelling and the like as small errors in one place can lead to massive errors at the end. Data provenance is going to be a big issue in this world of "easy" to develop Web Services.

The final element is about going the other way from the previous goal of IT which has tended to turn systems information into human focused information. The goal here is to take all of the information created in these collaborative and participative systems and turn it back into something that the enterprise can use, hence the reason I wanted to take a Wiki and put the information into a database.

So my little experiment proved that it can be done, and that its liable to be an issue in terms of data and provenance. Not sure on the solution yet, but at least it give me something to think about.

Technorati Tags: , , , , , ,


Neil Ward-Dutton said...


Data quality & provenance are the two key issues that will make the simplistic "enterprise mashups" story being told by many turn out to be a crock, as I've mentioned previously...

Vinaaayreddy said...

Its a very nice blog for...
architects in bangalore , architects in bangalore , interior designers in Bangalore , interior designers in Bangalore , architects in bangalore , architects in bangalore , interior designers in bangalore