Moving Home: Portable Site Information
by Lynn C. Rees
March 22, 2000
One common use of XML is to provide data for template-based
web pages created with XSLT. However, XML can be used to model the actual
structure of a web site too. Portable Site Information is a
project to develop an XML abstraction for template-based
web sites, to allow their migration between site development
frameworks such as NetObjects
or Cocoon.
Setting the Scene
Your first professional site. Objective: Hack two decades of untamed
document growth into inspiring corporate propaganda. Lines of
attack:
Hand coding: Get text editor. Change document by
hand. Verify changes didn't break anything. Make
corrections. Repeat. Site spontaneously emerges fully
upright from primordial document slime.
Site manager: Based on an old military maxim: he who wins
the high ground wins the battle. Hit documents from
above. Herd them into categories. Root out
individuality. Site is force-marched out of chaos by
leaving no other option open but surrender.
Verdict: Choose the site manager. From the commanding heights
you can do no wrong. Triumph is mere formality.
Achilles heel: An itch to tinker. There are better, more
ambitious site management frameworks out there.
Why not cap your success by
migrating your site over? Never fear. Everything else has gone
so smoothly.
Post-mortem: Death by reason of incompatibility. The site
manager
formats
explode on contact. Carefully crafted site structure collapses in the
blast. Nothing to do but collect the pieces for
reassembly. Hopefully, someone at
least learned their lesson.
I did. Years later, my friend Patrick Hayes was building
yetanother.com with a NetObjects frontend and a
Midgard
backend. He wanted NetObjects templates to translate directly
into Midgard themes and back. I proposed an XML-based format to
insure the templates passed cleanly between the two
applications. From this came Portable Site Information
(PSI).
Why Portable Site Information?
Ambitious sites require a god's eye view. They suffer from
epidemic link rot, content decay, navigation tangle, and
sitcom-caliber coherence. The grassroots school of web design
can't help. Site builders are an urgent necessity, not a
luxury.
But, if they're so vital, why don't site management frameworks
share even a
lowest common denominator compatibility? Cliche: the
whole is greater than the sum of its parts. Site managers do the
reverse. They deal in parts and sacrifice the whole to
application whim.
Interoperability strategies are trotted out to mask this
gap, ranging from simple metadata exchange to site managers
chatting over HTTP. Yet nothing, in the end, can replace the
freedom
to take your data and move on to greener pastures if you so
choose.
PSI extends this freedom to site structure. PSI maps a site not
as a collection of parts but as a complete whole. Site structure
is preserved in an open format, safe from application-specific
chains. Sites can then be moved freely between different site
builders without fear of structure loss. To insure this, PSI's
design goals are:
- A flexible hierarchy of containers.
- Clearly separated shared and unique data.
- Defining container position exactly in time and space.
- Filters for application-specific processing.
- Metadata hooks.
Future PSI may include:
- Mapping security through access control lists.
- Version tracking.
- Scheduling for site work coordination.
Basic Site Portability
All sites have at least three common aspects:
- Data.
- Hierarchy, either in how a site is stored, organized, or
both.
- Unique instances.
To make a site portable, PSI must account for each of these. Representing data is easy. Mark it as data:
<data>Heart of Darkness</data>
Site hierarchy exists for only one reason: data must appear only when needed.
This makes unique instances inevitable. If only part of your data needs to
appear, there are at least two unique instances: the data that appears and the
data that doesn't. PSI must allow data to be sorted into the right unique
instances and stacked in the proper order.
Within sites, most unique instances appear as documents, usually web
pages.
Within these, data can be subdivided again and presented as
paragraphs or tables. PSI maps these with sets.
<set id='conradTitle'>
<data><h2>Heart of Darkness</h2></data>
</set>
This is enough for most sites. They're nothing more than a series of static
instances. Other sites need more. Modern site design emphasizes taking the
shortest route to site completion possible. This requires minimizing redundant
data, usually by sharing common data between otherwise unique instances. Sets model this with a global or local scope. If global, data
is shared. If local, data belongs to a unique instance (e.g., a
particular page on a site). In this example, the title "Disturbed
Works" is global, whereas the title "Heart of Darkness" applies to
a particular page.
<global>
<set id='globalTitle'>
<data><h1>Disturbed Works</h1></data>
</set>
</global>
<local id='conrad'>
<set id='conradTitle'>
<data><h2>Heart of Darkness</h2></data>
</set>
</local>
The other half of data sharing is inserts. Inserts mark where
sets can be reused in a PSI hierarchy
outside of their original location. Like
sets, inserts have global or local scope. Global inserts are
shared by multiple sets; local inserts are monopolized by a single set.
Global sets and inserts can be combined to represent a sort of
shared template. Sets map the shared data and inserts mark where
unique data goes. This allows PSI to map sites that use sitewide
content generation mechanisms like themes, shared borders, and
server-side includes (typically these sites use technologies
such as PHP, ASPs or Cold Fusion).
<global>
<set id='globalTitle'>
<data><h1>Disturbed Works</h1></data>
</set>
<insert global='title'/>
</global>
<local id='conrad'>
<set id='conradTitle' insert='title'>
<data><h2>Heart of Darkness</h2></data>
</set>
</local>
The range of a global "template" is constrained by the
group element. Groups can contain one
global container and an unconstrained number of local containers. A group is often
used to model containers like directories. To capture site hierarchy more
accurately, groups can also be nested inside other groups.
<group id='root'>
<global>
<set='globalTitle'>
<data><h1>Disturbed Works</h1></data>
</set>
<insert global='title'/>
</global>
<local id='conrad'>
<set id='conradTitle' insert='title'>
<data><h2>Heart of Darkness</h2></data>
</set>
</local>
<local id='theEnd'>
<set id='next' insert='title'/>
<data><h2>The End</h2></data>
</set>
</local>
</group>
Since PSI maps other site building blocks, it must have a method to
determine when it is transformed into other common site structures. PSI provides internal
filters to list specific rules and conditions that must be met to do this. The
class attribute is used to group PSI data and then align these groups with
specific rules. An application-specific adapter processes the PSI data based
on the rules and conditions listed in the filter and routes the site structure
to the desired format.
<group class='container' id='root'>
<global classid='shared'>
<filter role='in' resolve='pass'>
<if classid='container'/>
<then role='map' value='folder'/>
</filter>
<filter role='in' resolve='pass'>
<if classid='shared'/>
<then role='map' value='shared border'/>
</filter>
<filter role='in' resolve='pass'>
<if classid='page'/>
<then role='map' value='html'/>
</filter>
<filter role='in' resolve='end'>
<if classid='page'/>
<then role='map' value='html'/>
</filter>
<filter role='in' resolve='end'>
<if classid='section'/>
<then role='map' value='div'/>
</filter>
<set class='section' id='globalTitle'>
<data><h1>Disturbed Works</h1></data>
</set>
<insert global='title'/>
</global>
<local class='page' id='conrad'>
<set class='section' id="conradTitle" insert='title'>
<data><h2>Heart of Darkness</h2></data>
</set>
</local>
<local class='section' id='theEnd'>
<set class='div' id='next' insert='title'/>
<data><h2>The End</h2></data>
</set>
</local>
</group>
In Conclusion
Work on PSI is ongoing. Currently, PSI uses a standard DTD to define
its syntax but we plan on migrating it to an RDF schema. This
will allow us to exploit PSI with more tools, as well as use it
with other RDF formats (like Dublin Core
and RSS) to create even
more powerful site models.
We're cleaning up our current code into an LGPL
library called
psilib and then releasing it through
psilib.sourceforge.net. It's turning into a useful tool for us
and may benefit others, which only makes our jobs as web
developers easier, especially if we get future site projects
already laid out in PSI.
XML is proven in modeling complex hierarchies for open
exchange. Sites are no exception. Developing PSI has helped us
glimpse the underlying patterns of the Web. More connects than
divides site structures. We hope to see a standard reflecting
this evolve so that any pain from future site evolution comes as
a side effect of creation, not transportation. PSI may
contribute to this. It may just dimly light the way. The end of
portable site information is more important than the means.