Death of the Relational Database?
Jennifer Zaino McDonald is working on the one project at his company that is not related to airline software -- a data acquisition and modeling and exploration and analysis system that involves a data store on the backend that does not leverage a relational database. It expects by the first half of next year to show something publicly. Semanticweb.com caught up with him to ask him about that project, and how it ties into the idea of a world where the influence of relational databases may be on the wane. Semanticweb.com: Tell us a little more about your project. McDonald: We invented our own query language for asking questions of a graph of structured data sources where everything is interlinked. It will be a public web site sometime next year but I can't say much more on that. The important thing is that it is a collection of a lot of data in any kind of structure, including a structure defined by the user of the system, so we needed a flexible way to store any kind of data, whether it's many to one or one to one connections or anything else. Imagine the structure behind the Internet Movie Database (IMDB) and now imagine generalizing the system without being told in advance that a movie has an actor and director and that kind of information. So our query language lets you ask questions that proceed through the data like you would if you were browsing through the IMDB. For example, if you want to know who are all the living directors who once directed Cary Grant. You could click on IMDB, search Cary Grant, get his movies, click on directors and see if there is a date of death and eventually you could find the answer, but it would be lot of clicking. To answer queries like that in a relational database is hard and sometimes impractical because of self-joins. In a relational database your question has to include with it the way bits of data relate to each other -- you can't just say give me all directors of the movies Cary Grant was in. It's cumbersome to ask complex questions and impossible to write them out in any simple way. We try to have the computer do what you would do manually, but faster and in bulk. So questions you could take an hour to figure out with notepaper, the computer has the information to answer for you in one third of a second, and in two seconds it should be able to answer that question of all -- the list of living directors for every actor on the planet and then find out which actor has the most living directors.
McDonald: The relational database was designed to handle the situation where you have a lot more data than computer memory -- that's oversimplified but basically true. But memory is cheap now and we have enough and it's much better if you can keep all that stuff in memory at once. So systems like ours that rely on that can run very fast and deal with a lot of stuff. The relational database is not dead but the things you can do with data get harder if you put then in a relational database. There are still huge data warehouses -- for example, for storing billions of historical records that you mostly don't look up again and you don't care about the connections between them. Some kinds of data naturally fit into one long table -for example, financial compliance records that this was done at this time, 1,000 times every second for months and months. You are not analyzing that or exploring it, then you don't need to do anything fancier with it. Then you can take advantage of decade's worth of expertise and back up and reliability. But certainly that's not what the IMDB is trying to do and it's not what most web sites are trying to do or what most enterprises that are trying to make sense of live data are doing. For example, it's one thing to have sales data that's a record of everything sold, but what if you want to know when and to whom and which customers are similar to each other based on overlap in what they bought or the order they bought them? These all are analytical questions that are difficult in a relational database. Semanticweb.com: But you did mention that at least, with relational databases, there are experts out there that understand them, there is reliability -- things that are important to enterprises. McDonald: Database administrators understand relational databases. Their expertise may be a practical benefit to you, but in the same way you could say there are lots of people who can shovel coal into coal fired engines, so we can't afford the costs of installing fancy new electric engines. That's true but it's a temporary advantage and long term you are better off if you are not requiring people shoveling coal. I think enterprises do have to go this [new] way if they care about being able to explore their data or letting your users explore their data or your data, if you need to answer these questions, the questions no one could answer before. The classic argument for this is, if your competitors can answer these questions about their data and do business better because now they can answer questions no one could answer before, the pressure is on you to answer them too. Semanticweb.com: How important are semantic web standards to this evolution? McDonald: They're important in the sense that a lot of the people who care about this topic work under the banner of the semantic web. But the problem is that of the two most relevant standards -- RDF and SPARQL--neither are actually adequate to provide systems. They are too low level. They could be the building blocks.... You have to basically build a paradigm on top of them that your tools can operate in. They don't provide a data model that people can use and the query language doesn't lend itself to the questions people ask or the interactions they want as they try to explore data. Email This Post |
The Voice of Semantic Web Business
|
|||||||