Why You Need to Share Metadata
Uche Ogbuji My background as a consultant is in the intersection of enterprise management of structured data, unstructured data, applications, and services. These have all been separate concerns for too long, claimed respectively by enterprise DBAs, content managers, software architects, and business process managers. More and more, people are understanding how these disciplines need to come together, as evidenced by the emergence of service-oriented architecture -- which aligns applications more closely with services -- and the effort of enterprise database systems to accommodate unstructured data. I've watched this slow convergence, but since it comes too slowly to meet many practical needs in business, I've often had to find my own strategies for connecting these worlds. In my experience the sweet spot for such work depends on the nature of the organization, but most often lies in content management (CM).
The goal of CM is to establish a writing shop, publishing house, and library for an organization. The material produced might be for communication to the outside, Web publishing for example, or it might be for internal knowledge, as in the case of enterprise CM. The hardest problem in managing such a combination of concerns is maintaining agreements and policies, which is why most CM processes and technology focus on workflow. CM workflow is now a fairly mature science, and it fits comfortably into related business processes. From an information management point of view, however, CM tools are just beginning to seek differentiation by their ready integration into structured databases, and general applications and services. The key to such integration is in sharing content metadata with other systems. The richer the metadata shared, the more value created in the integrated result. Richness is a matter of how well the metadata is placed in context. Content tagged with a simple string "release" might indicate that it's written as a press release, or perhaps that it relates to software releases. Rich metadata minimizes ambiguity of such details, and clarifies the relationship between content and each property.
The most effective tools use semantic technology, which is designed for sharing rich metadata across software domains. To help standardize such approaches, many leading CM and publishing companies came together to create a format specification for sharing rich metadata, Publishing Requirements for Industry Standard Metadata (PRISM). (For more information, go to prismstandard.org). PRISM builds on semantic technology such as RDF and Dublin Core, but it focuses on needs throughout the content lifecycle, providing a metadata framework, industry standard controlled vocabularies for semantic sharing across organizations, and guidelines for extensibility in custom CM situations. PRISM 1.0 was completed in 2001, and the working group has recently completed a 2.0 specification, pending a public comments period, which just closed. The plan is to finalize PRISM 2.0 this month. PRISM 2.0 is a major update that expands the expressiveness available through built-in constructs. It also defines metadata formats in XMP as well as XML and RDF, which brings rich, sharable metadata directly into multimedia objects. There's no guarantee that your CM vendors will be moving quickly to PRISM 2.0, and for the most part PRISM 1.0 provides rich enough metadata for many CM needs. More important than specific format is the general capability to reference unstructured data in its fullest context. If your present tools don't provide this, you might want to focus integration projects on lightweight layers that open up and expand content metadata building on whatever crude export you have available. PRISM 2.0 is easy enough to use in such a home-grown scenario, and will probably buy you better alignment with the CM industry over time. Semantic technology applications such as PRISM bring a reference section to the library function at the heart of the content lifecycle, and the right tools and processes allow you to develop the reference section in tandem with the library, improving accuracy and coverage. Content enhanced in this way is easier to integrate into other systems, more easily reused, and so more valuable in general. Uche Ogbuji specializes in the integration of next-generation web systems. He is a partner at Zepheira, which provides solutions to integrate, navigate, and manage information across personal, group, and enterprise boundaries to save time and money. Email This Post |
The Voice of Semantic Web Business
|
|||||||