Dublin Core in the Wild
by Dale Dougherty
October 25, 2000
The eighth meeting of the The Dublin Core
Metadata Initiative (DCMI) was held October 4-6 at the National Library of
Canada in Ottawa, Canada. Bringing together about 150 participants from 20
countries, DC8 was as much about focusing the future work of this group as it
was an opportunity to educate newcomers like us on the work that had already
been accomplished. We were there to explore the relationship between RSS and
Dublin Core.
The Director of DCMI, Stu Weibel of Online Computer Library Center
(OCLC), got things started. Weibel was not the only one to mention that
the DCMI are a passionate group -- stemming from their conviction that
metadata is the key to improving the state of the Web as an information
resource.
Weibel explained that DCMI started in 1994 at the 2nd World Wide
Web Conference, held in Chicago. Its name comes from Dublin, Ohio,
home of OCLC, a library computing consortium. The original mission of
DCMI was to improve resource discovery on the Web by establishing a
minimal set of metadata constructs, and Weibel reaffirmed that mission
in his opening talk. He said that DCMI has become an "open
consensus-building initiative" dedicated to improving the ways users
find things on the Web. While recognizing that DCMI is not the only
group working on metadata standardization, Weibel noted that DCMI's
approach has always been interdisciplinary, and its focus remains
fixed on the Web.
The DCMI group has produced two specifications. The Dublin Core Metadata Element
Set, Version 1.1 lists fifteen metadata elements that can be used
to identify properly anything on the Web. Among these are such
elements as Title, Creator, Description, and Subject. DCMI has also
produced the
Dublin Core Qualifiers, which includes two classes of qualifiers:
those which further refine or narrow the meaning of an element and
those which reference a known encoding scheme such as an existing
controlled vocabulary.
If resources on the Web provide structured metadata in accordance with the
Dublin Core, then this information can be catalogued, processed, and retrieved
in ways that assist users in locating what they want. Some sites already use
the
meta tag to supply Dublin Core elements for HTML documents.
This metadata can also be placed in a separate file and referenced. Some
search engines pay attention to the presence of metadata, but the impact of
supplying metadata is not easily discernible for the average web site
developer. Obviously, the success of Dublin Core is dependent upon achieving
critical mass with site developers who must see the benefits of
supplying metadata. It is also dependent upon having tools and services that
make use of this metadata in innovative ways.
A talk by Eric Miller of the OCLC (co-authored by Daniel Brickley
of the World Wide Web Consortium
(W3C) and Rael Dornfest) gave an overview of the current and
future Dublin Core Metadata landscape, the latter firmly rooted in Resource Description Framework
(RDF). RDF is an XML application rapidly gaining acceptance
as an effective way to express the relationships described
by metadata. Eric Miller of the OCLC made the point that Dublin Core
Metadata could be expressed in the syntax of RDF as well as in simple
English. Statements such as "The author is Alfred Jarry" and "The date
of publication is March 1, 1999" may seem obvious to humans, but must
be properly encoded (in RDF) so as to be understood by computers. When
processed by a computer, RDF statements actually support fairly
complex queries, the sort of which we cannot perform easily in a
search engine today: "Find me all articles by the person who's email
address is dale@oreilly.com."
Miller also pointed out that Dublin Core establishes a standard framework
for metadata. It's a common foundation for all applications that can also be
extended to serve the specialized needs of disparate groups. The virtue of
sharing a common framework means that a lot of useful metadata can be shared
across very different domains, and interoperability is there by design.
One only needs to look at Napster to see good and bad examples of effective
metadata usage. Metadata is what allows one to search by Title, Artist, etc.
Napster could be viewed as a system for managing the metadata associated with
music files. However, Napster is also a bad example of metadata because users
who upload an MP3 may be very loose in specifying this information. The same
song may be listed under different titles. Nonetheless, Napster, like RSS,
suggests that distributing metadata has commercial value.
In his plenary talk, "A Grammar of
Dublin Core", Thomas Baker of the German National Research Center for
Information Technology offered the view that the Dublin Core Metadata set was
more than a card-catalog system for the Internet. He called it a "pidgin for
digital tourists." A pidgin language is a specialized, small vocabulary that
can be useful for speaking in simple but effective ways. He used sentence
diagrams to demonstrate how Dublin Core statements work. The fifteen DC
elements are a limited list of nouns, while the DC qualifiers provide a rich,
yet standardised/restricted set of adjectives.
In some ways, metadata seems like an esoteric subject with potentially
fractal qualities -- "If data about data is metadata, what is data about
metadata?" Yet you can approach metadata from a very practical viewpoint. It's
how we organize information everyday in our calendars, address books,
organizational charts. An email message without To:, From:, and Subject:
headers is practically useless; it's a draft of a message not ready for
distribution. A specification like Dublin Core asks us to be more disciplined
in how we think about organizing our everyday data. While most people title
their Web pages, few think to add such tidbits as author, subject, publication
date, language, etc. The Web page may live on your Web server, but it's the
metadata that's picked up by search engines, screen scraped, and routed
everywhere people might be looking for it. When it comes to metadata, each
single addition brings a logarithmic increase in value; a little dab of
metadata goes a long way.
RSS and Dublin Core
The reason for going to the Dublin Core conference was to
strengthen the connection between the Dublin Core community and
developers of RDF Site Summary
(RSS). In many ways, RSS has already proved useful as a metadata
testbed and validates many of the assumptions implicit in the Dublin
Core efforts. RSS demonstrates that site developers will provide
metadata, and that the aggregation and flow of metadata can increase a
site's traffic. RSS originated at Netscape and it was meant to support
Dublin Core, but Netscape dropped it from the specification at the
last moment to the dismay of the DCMI community. Instead RSS 0.91
established a very small set of metadata constructs, essentially
Title, Link, and Description. In managing metadata through our Meerkat aggregator, and
as a publisher, we could see the limitations of the current RSS
framework; we simply didn't know enough about the individual items
flowing in an RSS channel; we believe that Dublin Core provides a much
needed metadata framework.
The new RSS 1.0 proposal provides a way to utilize Dublin Core as a
common framework for sharing richer metadata. The goal is to bridge
these two efforts so that Dublin Core can benefit from the experience
of RSS developers and their tools, and RSS can benefit from the
expertise of the Dublin Core community. One can continue to use the
current RSS and provide only Title, Link, Description; but if you
already have the metadata and want to make it available, then we
wanted to create a standard way to do so. That's the rationale behind
RSS 1.0. Thus, the combination of RSS and Dublin Core can provide a
powerful way of making this useful data available outside one's own
content management system.
Like many companies, we use content management systems, and we have
a lot of metadata about what we publish in our database. We generated
an RSS file for O'Reilly Network that
contains items that are Dublin Core compliant. Below is an example of
one item, produced by our system.
<item rdf:about="http://www.oreillynet.com/pub/a/linux/2000/10/
13/oa_openal.html"
<title>OpenAL Explained</title>
<link>http://www.oreillynet.com/pub/a/linux/2000/10/
13/oa_openal.html</link>
<dc:description>
OpenAOL is the Open Audio Library, a cross-platform, open source solution
for programming 2D and 3D audio.
</dc:description>
<dc:creator>Dave Phillips</dc:creator>
<dc:subject>Linux, APIs, Game Development, Gaming, Multimedia
</dc:subject>
</dc:type>Technical Article</dc:type>
<dc:language>en-us</dc:language>
<dc:date>2000-10-13</dc:date>
<dc:format>text/html</dc:format>
<dc:rights>Copyright 2000, O'Reilly Network</dc:rights>
<dc:publisher>O'Reilly and Associates, Inc.</dc:publisher>
Table of Contents
8th International Dublin
Core Metadata Initiative Workshop (DC-8)
Dublin Core Metadata Initiative
(DCMI)
Dublin Core Metadata Element
Set, Version 1.1
Dublin
Core Qualifiers
Online Computer
Library Center (OCLC)
HTML
4.01 : 7.4.4 Meta data
World Wide Web Consortium (W3C)
Resource Description Framework
(RDF)
RDF Site Summary (RSS)
"A
Grammar of Dublin Core" by Thomas Baker
As you can see, in addition, to Title, Link, and Description, we
have supplied the following fields: Creator, who in this case is the
author of the article; a list of Subject keywords; the Type of item,
in this case, a technical article; the Language in which it is
written; the Date it was published; the file Format; and a statement
about who owns the Rights to this article as well as the name of its
Publisher.
As we've said, Dublin Core provides a common set of metadata
constructs. One can go beyond and supply even more detailed metadata
for specific applications. However, there's a reasonable benefit to
supplying just this amount of metadata, which now opens the
possibility that a user could search for a document by its author,
date of publication, subject, and publisher. For example, a Linux site
publishes an article on Apache. By using the Subject field to supply
"Apache" as a keyword, an Apache web site, not interested in general Linux
information, can locate that story and point to it. We can imagine
applications that allow users to keep track of the metadata for
documents they browse, which could be much more useful than bookmarks
for retrieving something that has interested you.
Once the RSS 1.0 proposal is solidified, O'Reilly Network will begin
providing DC-compliant metadata via RSS. If you are interested in doing so as
well, please let us know.