Document Modeling with Bricolage
by David Wheeler
November 23, 2005
Previous Perl.com articles have reviewed where Bricolage fits into the universe of content management systems and worked through Bricolage installation and Bricolage configuration. Now it's time to go through the steps required to model the structure of an existing web page in Bricolage. Part of the motivation for the redesign of the Bricolage website last summer was to create good examples of document types and templates for use in Bricolage itself. You can take advantage of that work by analyzing a page on the current Bricolage site to determine how to break it down into its basic elements.
First, here's a brief introduction to document types in Bricolage.
The Elements of Bricolage Stories
Bricolage features two types of documents: stories and media. Stories contain text content and metadata; media are just like stories, but can have a single media file associated with them (an image file, movie, sound file, PDF, etc.). Whether a document is a story document or a media document, elements define its structure.
Elements are the basic building blocks of all documents in Bricolage. There are two types of elements: container elements and field elements. Container elements can contain any number of fields and other container elements. Fields, on the other hand, contain text content. Bricolage presents fields as standard HTML fields such as text, textarea, pulldown, radio, etc.--even a date and time widget. In a document model, one container element is the top element and it, along with all of the fields and subelements it contains, constitutes the structural model for the content of documents based on it.
The important point here is that Bricolage encourages a highly structured model for your documents; documents based on the models are thus structurally consistent. It also makes it easy to write incredibly flexible templates to output content in a variety of formats. (The next article in this series will cover Bricolage templating.) Document models can also be deeply hierarchical, to whatever extent is necessary to accurately model the structure of the documents being managed. Be careful, though, because if the model has too many levels of hierarchy, it will be more difficult for users to conceptualize when editing documents, as well as more work to drill down into deeply nested elements in the Bricolage user interface.
Document Analysis

Figure 1. A screenshot of the article being analyzed
Document analysis is the process of analyzing the layout of a document and breaking it down into its basic elements. Examining a page on a website, you must determine how all of the basic parts fit together and in what hierarchy, so that you can model the element structure necessary to accurately represent the document in Bricolage. That is, determine the container elements and fields that would be necessary to accurately recreate the document in Bricolage.
For the sake of this article, consider the structure of a typical page on the Bricolage website, because it makes a nice representative sample of the elements articles on the site will likely need (see Figure 1).
Identifying Content
The first thing to do is to determine what part of that page constitutes content and what does not. The term "content" here distinguishes those parts of the page that are important to the document itself, as opposed to the site overall or to a section of the site. For example, the banner at the top of the page appears on every page on the site; it is not specific to this document, nor is it significant to the document's contents. Likewise, the footer section is global to the site and contributes nothing to the document. The Recent News list in the right-hand column also has nothing to do with the contents of the article, it being a simple list of the five most recent articles published on the entire site.
These other components are includes, because they're included on many pages--or even on every page of the site. It also doesn't hurt that web servers generally pull them into the layout via a server-side include technology (such as mod_include, HTML::Mason, PHP, or JSP). Because they're not significant to the content of the document, you can ignore them for the rest of this analysis. Figure 2 depicts everything that remains.

Figure 2. The important part of the document for the purposes of analysis: the actual content
Defining the Top Element
Having isolated the content of the page, you can start breaking its content down into its component parts. First, give the document type a name; this name will also be the name of the top-level element. Because this is an article in the Bricolage website, this is simple: it's Article.
With that out of the way, it's time to pick out the field subelements of the document element. As fields are meaningful blocks of text, this is generally simple to do: they're headlines, paragraphs, subheads, and the like. Figure 2 indeed shows meaningful blocks of text:
- Headline
- This is the title of the article. In this case, it's "David Wheeler Interviewed on Online Tonight."
- Dateline
- This is the date for the article, here "2002.12.18."
- Paragraph
- These blocks of text make up the bulk of the content of the article. The first paragraph starts with "Bricolage maintainer and lead developer David Wheeler appeared on the Online Tonight with David Lawrence radio show."
- Header
- Section headers break up the content into sections, such as "How it All Started" and "Bricolage vs. Blogging Tools."
Identifying Subelements

Figure 3. The components making up a "Related Image" element
Pretty simple, right? Well the interesting part comes when you identify the container subelements of the layout. The way to do so is to look for areas where content has been logically grouped together as a unit; for instance, to combine an image and its caption. Such is exactly the case with the picture of David Lawrence. Call this container element a "Related Image," because it creates a link to an image document that's related to the current article content. Figure 3 illustrates the breakdown of the newly identified element: It has a link to a related media document and it has an "Alt Text" field and a "Caption" field.

Figure 4. The pieces of the "Related Audio" element include a tooltip (not shown)
Another clearly grouped collection of content is the box entitled "David Wheeler Takes to the Airwaves." This one has a link to an audio document and a description of its contents. It also has a speaker icon, but this isn't really content; it's more a hint to the viewer as to what she's linking to. That is, it's not content, but presentation. It adds no semantic meaning to the article. To parallel the "Related Image" subelement, call this one "Related Audio." Figure 4 highlights its subelements.

Figure 5. The "Pull Quote" element has a Paragraph field and an attribution field
Finally, there's one last subelement, the "Pull Quote." It simply groups together a quotation paragraph and an attribution. See Figure 5.
[1] [2] [3] [4] Next