SIOC-ing the Semantic Web
Jennifer Zaino In December, John Breslin, research leader of the social software group at DERI, noted that its tutorial proposal on SIOC (Semantically Interlinked Online Communities), which provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other, entitled "Interlinking Online Communities and Enriching Social Software with the Semantic Web" was accepted for the 14th International World Wide Web Conference to be held in Beijing, China in April. SemanticWeb.com recently caught up with Breslin to learn more about SIOC, which consists of the SIOC ontology, an open-standard machine readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging SIOC data. SemanticWeb.com: Tell us a bit about the development of SIOC. Breslin: SIOC [pronounced 'shock'] started off as an idea in my head three or three-and-a-half years ago. Because I had some experience in online communities (boards, etc.), I saw a need for providing methods to link these sites together. When you look for information on the Web to answer a question, you may get parts of your answer from different community sites. You have to trawl across a lot of these sites before you can get a complete answer. We wanted a method to be able to express the information from these communities in a standard form and then to allow this information to be linked together by adding methods for people to say, for example, that this information was written by the same person who wrote something else, or that it is related to something else on the same topic. It started off with the development of the SIOC core ontology, which is used to describe the domain of online communities and what they consist of -- users and posts and descriptions of other simple terms that occur in online communities. There is a lot of structure in online communities and inherent connections, in that people tag content, make replies or create trackbacks between posts. This structure that is created in online communities is often hidden in some database behind the scenes, and SIOC is used to expose that structure via semantics. First of all we just worked on SIOC internally and got feedback. Then we decided to get more feedback from the community through a W3C member submission process. We gathered partners in this space -- a combination of academic and industry partners -- and went through a year or so of getting this submission in place, which involved a lot of revisions. The vocabulary kind of evolved by community consensus. That was published in the end of July or beginning of August, and since then it has helped the initiative, as having a member submission makes it more visible and easier for us to get feedback. We will also be presenting a tutorial on SIOC at the WWW2008 conference in Beijing. This is the biggest web conference, so having a tutorial at that is obviously brilliant for us. Combined with the W3C submission, we know that there is significant interest in SIOC, but many people don't know what it is exactly and what it can be used for. We'll be explaining in our tutorial what SIOC is, how you can use it, and where it is being used already. In what ways is SIOC being used today? The initial approach was to provide the SIOC ontology and modules producing SIOC data [based on this ontology] for a lot of open source applications, as a lot of community sites are built on open source tools. So we wanted to provide SIOC functionality for these tools that people could then add to their own sites. We started to do this with a couple of modules and applications developed at DERI, and then others began to produce SIOC data creators for their own systems. It's making its way into commercial applications from OpenLink, Talis and Seesmic. For example, OpenLink DataSpaces uses SIOC as a kind of intermediary layer between users making queries to a variety of underlying community systems. So if you have a lot of community applications, their system lets you access the aggregate view of them.
There are probably, in terms of open source modules and commercial applications, about 40 to 50 different systems using SIOC data at the moment.
Breslin: The current hype around the idea of the social graph and "social network portability" is mainly about being able to bring your social network connections from one site to another. So if implemented, and if you were on Facebook, you could then move to LinkedIn and bring your profile and connections with you. The global social graph is composed of all the social network connections that are distributed across a multitude of sites. But it's not just your social connections that can be ported, but also the content you create on all these sites. SIOC can be used to provide a representation of all content items created by a person (via their user accounts) on various social media sites, and this can be nicely combined with the FOAF profile of that person who holds the associated user accounts. An eventual aim is the creation of social semantic information spaces, where all these different collaborative systems, like blogs and wikis, are connected together through the addition of semantics, allowing people to traverse across these different types of systems, reusing and porting their data between systems as required. So as you mentioned there is already a lot of structure in online communities-why do we need SIOC to take things up a level? There are two sides to what we can do by adding semantics to current Web 2.0 sites. First of all, we are allowing people to connect Web 2.0 items together in a stronger way. Let's say I type up a blog article talking about my travel to New York, and I want to put in some information about New York into the article, like where it is. Right now you have to click onto Wikipedia and look that up. But imagine writing this blog article and you have something like the DBpedia [structured information from the Wikipedia] acting as a data source for the kind of information you want to augment the article with. You could write an article about New York and annotate it with certain information, but all the information would be hidden behind the blog post, and if someone wanted to reuse it they could just drag the information on New York into own site or application. The other thing is that there is already quite a lot of tagging and connections being made in Web 2.0 sites. But tagging isn't enough. You need more information on what someone is talking about. Tags give you a good way of classifying stuff, but with more semantic information in there you could ask more specific questions about what type of content you want. For example, find blog posts written by people who have been living in New York during the past two years who like classical music. By adding all these connections about people and what they do and like, you can ask more complex questions than you can through a combination of tagging and current social network profiles. Does SIOC potentially have applicability for the enterprise as well? There are definitely enterprise applications for this. Even here in DERI, we are working on how to use SIOC in enterprise scenarios. Those scenarios are really just starting now. There's interest from diverse groups ranging from real estate people who want to use SIOC to represent real estate information, to the biology area, where it could represent scientific discourse, to government, where legal documents can be drafted in an argumentative discussion process. The basic ontology could be applied to different domains. This year I would say we will be looking more at specific application domains. We have general ontologies and so on, but we need to figure out what the requirements are in particular domains, whether real estate or biology, to target some use cases and see how SIOC needs to be augmented with some domain-specific descriptions. Email This Post |
The Voice of Semantic Web Business
|
|||||||