Semantic web concepts carry quite a bit of enthusiasm and hope: the hope is that semantic web ways and means can help us make sense of the vast ocean of resources out in the Internet, or perhaps make sense of our smaller seas of resources within our corporate data centers.
Topping the list of things the semantic web is supposed to provide is context sensitive search. Now, I purposefully did not say "semantic search, " simply because I want to describe how to reason about semantics and so I need to use other terms. I chose to use the term "context" to illustrate how semantics can be applied. Think of ontologies (which describe types of things -- or resources -- and their properties) as a way of establishing a context. If you adopt a semantic context, the things you find when you search, as well as the properties you uncover about the things you find, belong to that context.
As such, you can think about the semantics used in searching as a sort of lens through which you see matching resources. Which is to say: without a semantic context, documents are just documents, undistinguished from each other in any contextual way. Perhaps you can search for documents by filetype and perhaps also limit the search to those files containing certain words, but this is a lexical search, not a semantic one.
To understand the difference, consider that when I search for a dwelling in a certain price range, I want to see all of the things that mean dwelling (home, dwelling, house, residence, condo, cottage, townhouse,
). An ontology describing Real Estate might establish an equivalence between Dwelling and these other classes, so my search finds things of similar meaning, not just the string of characters making up a word.
Turning to a document example, I may want to search for all of the documents in my document repository that are legal documents, and even particular types of legal documents. One way I can imagine doing this is to first set my search context to use an ontology describing legal artifacts. Then, using this context, I can ask to see all of the Litigation documents.
It may sound magical, but in fact is quite mechanical. Within my ontology for legal artifacts a document is a litigation document if it is either explicitly tagged as such (a member of the class; Litigation Document), or matches the criteria that infers with high likelihood that the contents matches that of a litigation document. Although the inference criteria may not be perfect, it can be refined over time. Tuning the criteria allows us to find legal documents of specific type much easier than just doing a brute force search. Also, when we work with a pre-determined set of ontologies, mechanisms can go out ahead of time, and apply the criteria to the documents to pre-classify them according to the inference criteria expressed within those ontologies.
Inference is most valuable in cases where the authors of documents, or their authoring tools, don't help very much in establishing the context and meaning of documents. One could supply information for use in semantic search by supplying meaningful metadata along with the contents of the document. In the future, more sophisticated authoring approaches would find authors doing this, but for now, lets talk more about the discovery of meaning and applying classifications to existing documents.
I recently submitted a friend's resume to an online employee referral site. It was able to scan the text uploaded and pull out the educational history of the individual (among other things) and present it for verification with very good accuracy. This sophisticated scan of the document is an example of extracting facts that go beyond simple word matching, and are useful in semantic based searches. It is clear that one could later apply a localized ontology of colleges and universities with this the capability to express something like, "Show me all of the resumes of candidates who graduated from a Preferred School holding a Postgraduate Degree in an Engineering Discipline" -- assuming definitions of Postgraduate Degree, Engineering Discipline and Preferred School within that ontology.