DIDL: Packaging Digital Content
by Mark Walker, Todd Schwartz, Vaughn Iverson
May 30, 2001
Overview
In this article we detail the reasons for undertaking the
development of a digital packaging standard and describe in depth a
package manifest scheme that potentially addresses the enumerated
needs. In doing so, we show how such a scheme effectively
disassociates the notion of content item from individual files. We
conclude by describing an XML vocabulary, the Digital Item Declaration
Language (DIDL), a recently released first working draft from ISO/MPEG
that will, when completed, provide standard means for packaging
digital content.
The Need for a Raw Content Description Standard
Today's popular Internet applications generally fall short in their
ability to transfer raw resource content. The content of a web page
for example may be defined as the collection of discrete resources --
bitmaps, JPEG images, text blocks, and so on -- that are aggregated
within some predetermined format. The components of the web page may
possess attributes and relationships that, while not explicitly part
of the final, viewable form, may be critical in generating the
displayed result. Information accompanying a JPEG image, for example,
could be utilized in creating a photo caption. Information about the
relationships among a group of images could be utilized in locating
the images on the page. If the web page is generated from a script,
information on the sizes of the various images could be utilized to
decide which images to begin downloading first.
Does DIDL sound like a good idea? Are there other approaches which might DIDL could borrow from, or which would be complementary?
Post your comments
Describing raw content as a structured collection of resources in a
standard manner requires: (1) a standard and flexible metadata format;
(2) a standard way to aggregate multiple resources of various types;
and (3) a standard way to express structural relationships within the
resource collection. Associating standard-form metadata with a given
file allows semantic descriptions and application-specific behavior to
be directly associated with content contained in the file. Currently,
ad hoc metadata schemes are employed in several Internet
applications. In peer networks for example, long file names are often
used as crude substitutes for semantic descriptions of file
contents. File headers are also utilized; but header formats are
largely designed to document only the technical rather than semantic
contents of a particular file. And in spite of the widespread use of
headers, digital content in the form of a standalone file currently
cannot be delivered to any client or rendering platform without a
significant amount of user intervention. Intervention typically takes
the form of directing a browser to some web site, selecting some
resource URI for download or streaming, and, then, if it's a file,
directing the downloadable material to a directory. Rendering or
viewing the content in many cases includes being informed by the
client system that a required plug-in or player is either not
installed or not updated, requiring the user to search the Web for the
right rendering engine or viewer.
The greatest limitation of multimedia header and file formatting
schemes is that they are inherently incapable of describing
multicomponent collections. XHTML, for example, while serving well as
an output format for multicomponent content, is not adequate for
describing the raw digital components and their
relationships. Standard ways of aggregating multiple digital
components in an output-agnostic way are required simply because
things like web pages and other display types are composed of many
items.
Finally, the ability to describe relationships (this goes with that
component, this component contains that component, etc.) in a formal
way is required to associate things like images with their
corresponding descriptive text. It also could be used to describe
component structures that would otherwise be difficult to describe
with textual metadata.
Case In Point: The Family Album in Cyberspace
Consider the digital family scrapbook. The scrapbook may be
composed of digital photos, video, and text documents. The scrapbook
designer needs a straightforward way to represent the individual
digital components as a single entity, to annotate the components, and
to specify the relationships among the components ("this video and
these pictures were taken on Bob and Emily's last trip to
Florida"). Having a formal annotation scheme would allow other family
members to add new annotations without disturbing the original content
("caption this picture"). It would also permit the setting of
intermedia anchor points. This would be especially useful for long
videos containing sequences of special interest ("here's the part
where Bob fell off the boat"). All of the technical information
required by the viewing client, like the media format of each
component, sizes of the binary elements, and so on, would need to be
included as transparently as possible. Since the collection is likely
to be viewed by friends and family on all kinds of computing
platforms, a user-transparent way to package together multiple format
versions of the same content is also critical for minimizing user
intervention in obtaining the album ("I need the QuickTime version of
this video").
Another scrapbook need that exposes additional packaging
requirements is the case of content that requires encryption,
identification, or formal rights declarations to be associated with
some specific source component. In the scrapbook example, one might
want to associate a specific picture or some other component with a
formal copyright statement. If one of the pictures was a derivative of
some other photo, identifying it as a copy and also identifying the
original source would be valuable. Noting what specifically
constituted the original content would be critical in order to
maintain the original material as inviolate and reconstructable under
long-term usage and storage.
Perhaps the strongest motivation for the use of digital packages
emerges from the distinction between the scrapbook package manifest
and the resources. While it would be occasionally necessary to
actually encapsulate small resources (like thumbnail images) in the
manifest itself, most resources would be included in the package by
reference. In the digital scrapbook, each component would ideally be
accompanied not only by a detailed description of its media type but
also the URI for obtaining the platform-specific browser/player
plug-in capable of rendering the media type. This would be an
especially critical feature in the design of a scrapbook for an
extended family in which the various digital components of the
collection were located in different, fixed archives in geographically
far-flung locations.
The highly compact nature of the manifest would allow it to be rapidly
transmitted and edited without dragging around the whole collection.
The content of the scrapbook would thus be defined by the scrapbook
package manifest description rather than the collection components
themselves.
Metadata associated with each component and component relationship
would also allow the viewer to execute searches on the package
manifest (perhaps employing regular expressions) for specific
components and, thus, to download or view only a subset of the
materials referenced by the package ("Retrieve only the pictures of
Bob and Emily when they lived in Ohio").
Finally, since a given package manifest would describe only the
structural and semantic relationships of the components in the
scrapbook collection in a completely output-agnostic way, formatting
for renderable output would be relegated to the application software,
or to a transformation or stylesheet. This would allow a multitude of
differently-formatted scrapbooks to be generated from the same package
manifest.
[1] [2] Next