Semantic Web - The Voice of Semantic Web BusinessWeb 3.0SemanticWeb100

The Ellerdale Project:Real-Time Semantic Search Is Cool For Consumers But Has Business In Its Sights

ellerdalescreen.jpg

The Ellerdale Project sounds like it could be the name of a new Matt Damon movie, but in fact it’s a real-time semantic search engine – with full access to the Twitter firehose – whose destination Trends site is designed to give a taste of what its technology can do for businesses.

Financial companies that want to mine social data for intelligence to help drive investment decisions, market analysts who want insight into product sentiment via the social stream (a sentiment algorithm is underway), web sites that want to optimize content placement, and advertisers who want to optimize ad placement will be the way The Ellerdale Project butters its bread. For example, advertisers may want to understand what sites their target demographics are visiting (through tweet short URL references and the top articles they can add up to) and place ads there, or what artists are topping social users’ trends interests and associate key words to those performers. Twitter information such as location can add to the ability to fine-tune demographic reach-out.

The intent, according to the company, is to construct a search query to identify a certain demographic in the social sphere and within that data set discover the dominant topics people are discussing, what the sentiment is on that topic, and use that data to optimize ad placement.

“We can find out emerging topics, how data sets are trending,” says co-founder Arthur van Hoff. “Twitter is incredible in how it is really real-time. It is interesting that they can make short-term decisions about ad placement they couldn’t make in the past. The data they had before was a week or a month old. Now they can look at trends in the last hour, and that opens up a whole set of new possibilities.”


Businesses can get free but limited access to The Ellerdale Project’s API or commercial access where fees depend on the amount of Twitter data and application required. The company also can do custom data analysis for businesses on data that it can’t redistribute as part of its licensing agreement. Most of its data right now is from Twitter though some RSS feeds are included as well. “The plan is to monetize our work that way and continue to use the Trends site as a showcase for the technology,” van Hoff says.

The Technology
So what makes The Ellerdale Project tick? It semantically processes every tweet as it’s made and indexes it, crawls the pages referenced and put them in its database. Right now it’s handling about 2.3 million messages an hour, several hundred a second. Once done it’s Identified not only what keywords are in the message but also what the topics are that are referenced in the message. No mean trick, since there are lots of aliases for a single topic. Take the ever-popular Justin Bieber, who may be referred to as Justin or Bieber or @justinbieber. “So if someone talks about Justin Bieber we try to use disambiguation to identify the topic that person is talking about so we can then collect information about that topic in an aggregate way. We find all the tweets no matter how people refer to him. That has a big impact on the usefulness of the data – there is so much ambiguity, especially in short messages – for analysis,” says van Hoff.

eller2.jpg It’s so far stored about 3.5 billion tweets in its database and about 8 million topics. Disambiguation on the topic front is important especially to businesses. For example, if a marketer wants to understand how Tide, the laundry detergent, is being referenced on the social web, it’s likely he doesn’t really care about the movie Tideland or tide pools – and The Ellerdale Project lets them choose to look only at the topic page specifically about the cleaning agent. “It turns out that the world is extremely ambiguous and it’s hard to disambiguate topics this way,” he says. Accompanying the real time tweet message stream view into the detergent the searcher also would be able to see a topic information profile courtesy of Wikipedia, the topic’s many aliases, related topics based on its computation of what people are saying in combination with the topic in question, the aforementioned top articles (most recent, last week, etc.), the top retweeted on the topic, and messages per hour on the topic. Alternately, anyone interested in tide regardless of what it specifically refers to can do a search on the term without filtering down specific menu options.

The Trends site shows only a few of the categories the technology can handle, but in fact there are 200,000 of them. It is possible to search on, say, [digital cameras] and get menu options to explore all these tweets by that category, including those that don’t even mention the term. Because The Ellerdale Project has categorized it, it knows the tweet is in reference to digital cameras, which from an API standpoint can be very compelling for marketers and financial analysts, it says.

The Ellerdale Project wound up building its own semantic database because it couldn’t find anything to sustain the volume it’s handling, van Hoff says – that’s an insertion of 2 million new rows every hour. It’s also build its own real time reverse index for complication searches (and, or, not). And since it has licensed the full Twitter firehose, it isn’t dependent on Twitter in terms of getting access. “There are no API calls that we have to make, so most of the time it’s much faster,” van Hoff says.

One area The Ellerdale Project is keeping an eye on is how Twitter Annotations can impact its technology. Could be great, or could be a free for all where it is harder to derive value, he says. “But if there is some structured data in there clearly we would leverage that for disambiguation,” the metadata helping its engine to understand the message more clearly, he says. “One challenging thing now is feeds have links but all of them are shortened. We get 600 messages a second so we have to unshorten all those URLs at a high rate and that is a problem. Metadata could actually contain the long version of the URL, so we don’t have to go back to shortened URL to figure out what links are pointing to.” Van Hoff says. “Those things will be extremely helpful.”

Interestingly, The Ellerdale Project began life with the idea that it could create build a better semantic search engine, extracting entities across the web to do more interesting searches – a merger of Google, Freebase and Wikipedia that led to helping people get real answers to questions and not just links. But as VC Don Butler pointed out in this story, not many startups can succeed against the big search guns. Van Hoff came to the same conclusion. “It took about a year to build the implementation of that. Then we realized what you end up with is a search engine that is statistically more powerful than Google, but who is going to use it? It’s hard to compete.” The Ellerdale Project has funding by angel investors including Ron Conway and Roger Sippl.

Hence the turn of the technology to focus on real-time search, which has its own challenges with shorter messages and less content. “But if you are clever there’s a lot of context,” van Hoff says. “A lot of people link to pages that provide a lot of context. Or when someone sends out a message usually they are commenting on an event as it happens, so the time factor is a good way to disambiguate, too.”


mediabistro.com event

Smartphone Games Summit

The Smartphone Games Summit is a one-day conference focused on the emerging smartphone games space! Be there on September 24 as industry leaders including the CEOs of Aurora Feint, Kongregate, and Greystripe provide insight on what's signal and what's noise in this space. See the complete program with speakers.

Email This Post

Fill out the following information and click on the Send button in order to send this post, The Ellerdale Project:Real-Time Semantic Search Is Cool For Consumers But Has Business In Its Sights , to a friend.
Friend's name
Friend's email address
Your name
Your email address
Note to your friend (optional, max 200 Characters)

Read more on Semantic Web >

The Voice of Semantic Web Business
Semantic Web in Your Inbox
Mobile Version
RSS Feed

Job Listings

Featured Listings

Vice President, Online and Mobile Advertising
AccuWeather, Inc.
New York, NY

Public Relations Manager - People
Time Inc. (People)
New York, NY

Outside Sales Representative/Market Manager
FreshGuide (Sugar, Inc.)
Washington, DC


WebMediaBrands
mediabistro learnnetwork freelanceconnect SemanticWeb
Jobs | Events | News
Copyright 2010 WebMediaBrands Inc. All rights reserved.
Advertise | Terms of Use | Privacy Policy