Monday, May 14, 2007

Is there a problem for the Semantic Web to solve?

Now I've blogged before about the "semantic" web not really being semantic for web services but I've been thinking even more about the problem that the semantic web with its descriptions of information tries to solve and I'm really not convinced that from a business scenario this is something for anyone to be worrying about today. Sure the concept of "automatic" consumption and transformation sounds beguiling at first, but isn't this pretty much the same vision that was promoted around UDDI at the start for Web Services?

What I mean here is that the thing that this tries to solve is people's understanding of information and automate that process. From some reviews I've done recently, and a conference or two I've attended, the accuracy of these transformations are still pretty ropey and are more about helping people at design time than being something you would rely on in a production runtime.

So really here we have a way of adding "hints" in to people about what a given field means so it can help them understand what it maps to and maybe make a suggestion that might, or might not, be accepted. But is RDF/OWL and the like really the way to go about this? Or should we think more in terms of the sort of free form association that Google gives us? What I mean here is think about the way Google maps works "Hotels near London" where it looks for the term "hotel" and a geo location that is around London (another inference), in effect they create a semantic tree for those terms based on the probability that this is what you meant.

Now I've never used an RDF file to help me describe "Hotel" or "London" to Google, I'm just relying on it having built up a contextual reference that means it takes a good guess at the answer.

So are RDF and OWL really required? Or is the solution to have a Google contextual search?

Technorati Tags: ,


PetrolHead said...

I think it's mostly about ensuring that information has appropriate markup that can be used by humans to write code that can chew through all that information.

Right now the closest one gets to that is tagging as done with etc. Trouble is the tags are held separate from the information. One might prefer to have it embedded and thus the first step to semantic web.

I'm guessing that ontologies then follow on as a natural step for attempting to make sure everyone uses the same markup so as to give maximal possible opportunity for processing as broad a set of information as possible.

Is it useful? Dunno. We probably need sufficient information marked up this way that people can have a go at doing something interesting with it.

My two cents,


Steve Jones said...

Now I swear I didn't see this before I did the post but I've just done a quick follow up on exactly what I meant because Google language tools are doing translation without explicit semantics.

I'm with you that there is certainly more information to be described than exists in a standard XML document, I'm just not sure that the "semantic web" will be the revolution that people are proposing.

Ben said...

The idea of the semantic web is founded on the concept of an ontology, defined as an explicit and formal specification of a shared conceptualization (of a domain). The fact that you use OWL for that is in principle merely an implementation detail although it's quite vital from a practical point of view that everyone uses the same language especially for computer to computer communication

Ofcourse an ontology isn't strictly needed from a communications point of view, but from a scalability point of view it is since with an ontology you only need a mapping from each participants internal model to the common model(n) whereas without an ontology everyone will have to map onto everyone else's model resulting in n(n-1) mappings. The latter is quite a serious issue if you start thinking on a world wide level.

Anyway, to answer your question. Something like OWL/RDFS is really needed but we aren't there yet. There are still a bunch of issues to be solved before it will be practically usefull on a world wide level. Not only do we need to define these ontologies, something we are struggling with since Aristotle, we will also need to cope with the temporal issues which is something hardly anyone has looked at so far. The classical closed world assumption that makes traditional databases so usefull is gone in the semantic web. As a result we will need mechanisms to cope with information only being valid for a certain period in time (when does a piece of information get updated or invalid, how will my system know that, etc) Constraints for a concept may change throughout it's lifecycle and that isn't being coped with at the moment either Add to this fun issues such as ontology version management(especially for more drastic changes) or agreeing with eachother on higher level concepts such as God (better start dodging those bullets...) and it quickly becomes clear why hardly anyone is doing something with the semantic web in practice, atleast not on the world wide web. Also, defining more detailed models isn't exactly easy (you have to stay very generic) and certainly not cheap (domain experts, conceptual modelling experts, lot's of communication overhead due to the number of parties involved, we are not used to think like that, etc).

As you already indicated within a company or a business community RDFS/OWL is quite usefull at design time especially since you can easily extend your model during run time without breaking stuff Another quite usefull application is knowledge systems with complex models (much more then just a tree or two) for small user groups etc

To summarize, the semantic web isn't strictly needed to solve the problems it addresses but from a scalability/efficiency point of view on the world wide web, it is. However, we still need to solve quite some (often rather nasty) issues before the idea can be really put into practice on a world wide level. In the meanwhile it's fun to toy around with if you happen to have a project that can make use of it.

rdf said...

Personally I've found it useful to think of the semantic web as having two components.

One component, centered around RDF assures identity, the other centered around OWL and ontologies supports inference and context.

I think of identity as being crucial for communication/integration.

Inference is interesting and might prove useful, but I am unsure how to incorporate it into a live system since it seems hard to a priori scope out the consequences of inference.