Electronic Publishing with XML
by John McKeown, Benjamin Jung
June 27, 2001
Introduction
In this article, we describe the process of creating electronic
publications using XML and related standards. This publishing
procedure has been used to generate conference proceedings for the XML Europe 2001
Conference. We will describe the most important steps in this
XML-based publishing process and highlight some of its advantages.
XML Europe 2001
Now in its seventeenth year, the XML Europe Conference was held
this year in Berlin (May 21-25, 2001). Formerly known as SGML Europe,
the conference was renamed SGML/XML Europe in 1998 and subsequently
became XML Europe.
In the past, the proceedings for XML Europe have been available in
both paper and electronic formats. For various reasons, the conference
organizers, GCA, discarded the paper
version this year and opted for an electronic publication only. This
was distributed on CD-ROM to each of the conference
delegates. Additionally, the GCA used this publication as the basis
for an online version on their web site. XML
technologies were used throughout the creation process.
An XML-based Publishing Process
Producing a publication using XML technologies involves a number of
distinct steps: content creation, validation, and publication. These
steps are discussed in the following sections and are applicable to
the production of any publication (electronic or print) with XML.
Step 1: XML Content Creation
The first step in an XML-based publishing process is the creation
or acquisition of content in an appropriate XML vocabulary. The
vocabulary should be flexible enough to represent all common features
(e.g. headings, sections, sub-sections, paragraphs, links) and
advanced features (e.g. tables, figures and bibliography) of a
publication. One possible vocabulary is DocBook XML, used to markup
documents such as books, articles, and technical documentation in
logical sections.
For the XML Europe conference, an XML DTD was developed that
defines the structure of a generic conference paper. This is known as
the GCAPaper DTD. Each author whose presentation abstract was
accepted by the conference program committee was requested to submit
the final paper in XML according to the GCAPaper DTD. The use of this
DTD ensures a similar structure for each paper. Thus, all papers can
be processed in an identical manner by the publishing process. Here is
an example document.
<gcapaper id="s01-1" day="Tuesday" attendee="All">
<front>
<title>The power of XML</title>
<author refid="s01-1auth4">
<fname>John</fname>
<surname>Smith</surname>
<jobtitle>Senior Consultant</jobtitle>
<address>
<affil>Global Enterprises</affil>
<city>Dublin</city>
<cntry>Ireland</cntry>
<email>john.smith4@globent.com</email>
</address>
<bio id="s01-1auth4">
<para>
<highlight>John Smith</highlight> - John is a senior
consultant for Global Enterprises
</para>
</bio>
</author>
<abstract>
<para>XML is a powerful language for defining markup languages
for specific application domains. The XML Specification has
been a W3C recommendation since February 1998.</para>
</abstract>
</front>
<body>
<para>Paper unavailable at press time.</para>
</body>
</gcapaper>
To support authors and facilitate the creation of papers in XML, a
variety of tools were provided. These included dedicated XML editors
(Epic by Arbortext and XMetal
by SoftQuad) and extensions to
Microsoft Word that allow content to be exported to XML (WorX by HyperVision and S4/Text by i4i). Each of these tools were made
available under an evaluation license and were customized to produce
XML content adhering to the GCAPaper DTD.
Step 2: Input Validation
Once the content for a publication is in XML, it needs to be
validated against the publication DTD. This type of structural
validation is a core feature of XML and can easily be performed using
any validating XML parser. In addition to structural validation, it is
also necessary to validate the contents of the publication
logically. This ensures that elements in the DTD have been used in a
consistent and correct manner (e.g. "Dublin" is marked as a city and
not as a country). The content validation step is particularly
important when the content originates from many sources.
Almost all papers submitted to XML Europe 2001 adhered to the
GCAPaper DTD. An exception included Microsoft PowerPoint
presentations, which had to be converted to the GCAPaper DTD structure
before they could be included in the conference proceedings
publication. Further validation of all papers was then required to
ensure they adhered to specific authoring guidelines for the DTD.
Want to know more about the processes used, or have your own experience with 100% XML publishing? Let us know by using the forum.
Post your comments
The authoring guidelines accompanying the GCAPaper DTD specify the
correct usage of elements in the DTD and also define naming
conventions for cross-references and images used within each
paper. Validation of authoring guidelines is especially important for
conference proceedings as a variety of authoring tools are used to
produce papers. Once all conference papers were received and
validated, they were imported into a master document representing the
conference proceedings publication.