Now: Tutorial for Web and Software Design > XML > Styles > XML Content
> Diagramming the XML Family [Bookmark it]
Diagramming the XML Family

Diagramming the XML Family

by Daniel Zambonini
October 08, 2003

In this article we'll introduce some of the XML family members and discuss how they relate to one another. We'll then use these technologies to create a diagram of their relationships in order to demonstrate how they work together in practice. Of the hundreds of XML technologies in use, we'll limit the scope of this article to the technologies used in the creation of the diagram.

XML

XML (eXtensible Markup Language) consists of a small set of rules which define a structured, text-based syntax for representing data. It isn't a language as such; rather it is a meta-language, a common syntax that can be shared across diverse standards and data models. But if XML doesn't really do anything, why have so many languages adopted the XML syntax? At a basic level, XML is

  • Simple -- easy to learn and use, especially for users familiar with HTML.
  • Flexible -- can be used in many situations, from graphics, to communication protocols, to raw data.
  • Open -- no licensing or pricing restrictions, vendor, or platform lock-ins.

W3C XML Schema

W3C XML Schema allows us to create vocabularies with XML by adding further restrictions to the core XML rules. These restrictions mainly consist of valid names for elements and attributes; which elements can be used inside another; valid repetition of elements; and the type of data for elements and attributes.

Schemas allow us to publish and share these rules for new vocabularies and check the validity of any files which claim adherence to a particular vocabulary. Unlike Document Type Definitions, XML Schema uses the XML syntax, allowing us to parse and query XML Schema files using standard XML tools. (See the XML.com article "Using W3C XML Schema" for more detail.)

Namespaces

Given the ability to create different vocabularies in XML, together with a feature that allows the combination of vocabulary elements into a single file, we are presented with the name collision problem.

For example, an XML schema for books could define a <table> element that defines a table of contents. A second schema for furniture could also define an element named <table>. If data from both were to occupy the same XML file, it would be impossible to differentiate which <table> was which.

By defining a unique identifier, a namespace, for each XML vocabulary, we can group the elements under each identifier, so that XML software can identify each vocabulary element being used. (See the XML.com article "XML Namespaces by Example" for more information.)

URIs

Namespaces create another problem. It's fine to suggest that each XML vocabulary should be assigned a unique identifier, but how can you ensure that the unique identifier you choose hasn't already been used? Theoretically, there are no assurances that a namespace identifier hasn't been used by another vocabulary. However, by using a URI (Uniform Resource Identifier) you can greatly reduce the chance of a namespace collision.

A URI can be one of two types: a URN (Uniform Resource Name) or a URL (Uniform Resource Locator). The distinction between the two is a little vague and overlaps in some respects. URLs identify a resource by its location or by an address for accessing the resource. URNs identify a resource by an address that doesn't necessarily access the resource but which must be unique and must always refer to the same resource, even if it moves or becomes obsolete. In this way, a URN could also be a URL, if the URL address was guaranteed to persist and always point to the same resource.

In practice, URLs are the most commonly used kind of URI, particularly for namespace identifiers. Once an organization has purchased a unique domain name, it can create namespace identifiers based on this name. By using URLs which it theoretically owns, the organization can control and manage namespace identifiers under this domain, ensuring no namespace conflicts.

RDF

The Resource Description Framework (RDF) is a model for representing resource metadata, that is, information about things. These "things" can be web pages, people, books, or anything else. The information could be file size, height, color, or any other property that something might have. RDF therefore consists of a number of statements about something:

  • Notes from a Small Island has an ISBN of 0552996009
  • Notes from a Small Island has an author of Bill Bryson
  • Bill Bryson has a birth place of Iowa

Note that each statement (or triple) is constructed from three parts: the resource (Notes from a Small Island), the property name (ISBN), and the property value (0552996009). These statements could be represented in any XML vocabulary:

<book name="Notes from a small island">
    <ISBN>0552996009</ISBN>
    <author birthplace="Iowa">Bill Bryson</author>
</book>

or

<document identifier="0552996009">
    <title>Notes from a small island</title>
    <creator name="Bill Bryson">
        <born location="Iowa" />
    </creator>
</document>

These two examples, by demonstrating the versatility of XML, show part of the problem that RDF solves. If we rely on just XML, these statements can be represented in an unlimited number of vocabularies, each with its own schema and rules. An application that had to search a collection of 100 different XML files for books written by Bill Bryson would need to know the exact element or attribute to search for within each vocabulary, that is, the application would need prior knowledge of each vocabulary.

By introducing an overarching model, RDF provides a superior solution. The RDF model is enforced within the XML syntax by basically restricting the XML rules and by introducing a set of core elements and attributes. The real power of RDF comes from its use of namespaces and URIs. RDF vocabularies can be defined with RDF Schema. Within an RDF instance document, however, these vocabularies can be more easily mixed than in standard XML. Once a vocabulary has been created that defines an author property, as long as the vocabulary is assigned a unique namespace, the property can be used in any RDF file. RDF software, with knowledge of just a single RDF vocabulary, can search all statements in all files for particular authors.

URIs provide the icing on the RDF cake. When an RDF vocabulary is defined, each element within it can be referenced by a URI, uniquely identifying it. Each element can also define any part of a triple, i.e. RDF can be used to create lists of resources (things you want to describe), properties, and property values. A triple can consist of nothing but URI references.

Unlike standard XML, RDF files commonly contain data that can be decomposed into a set of URIs. For example, the previous XML example could be represented in RDF as

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.0/"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

    <rdf:Description rdf:about="urn:isbn:0552996009">
        <dc:title>Notes from a Small Island</dc:title>
        <dc:creator rdf:resource="http://authors.com/b/bbryson.rdf" />
    </rdf:Description>

</rdf:RDF>

If we examine one of the triples that an RDF parser would give us, we find:

  • Item we are describing: urn:isbn:0552996009
  • Property of this item: http://purl.org/dc/elements/1.0/creator
  • Value of this property: http://authors.com/b/bbryson.rdf

This has two immediate benefits:

  • All aspects of the data can be unambiguously identified. For example, in a standard XML document, the author could be stated as "Bill Bryson", "Mr Bill Bryson" or "Bryson, Bill." Software searching for this author would need to be aware of the potential differences in representation and could also run into trouble if a second author existed with the same name. In RDF, assuming that all files use the same URI to reference the author, the author can always be uniquely identified.
  • If these triple values are also RDF resources, the software can automatically find related information. For example, some other RDF document could contain information on the author's birthplace, which in turn may be represented by another URI. An RDF document concerning US cities could then be consulted, which might contain information such as longitude and latitude, average temperature, etc. So, by making a single statement in RDF, further and related information can potentially be automatically acquired.

The real benefit of these explicit, ongoing relationships of information (the Semantic Web) is enormous. The near future will hopefully bring us software agents, which can query and use a whole range of RDF information for a given question (e.g. "Show me hardback books created by anyone who is related to a politician" or "Which servers can run Solaris and are available in the UK?").

If nothing else, if and when engines such as Google starts to make use of RDF and its power of relations, it will become extremely good at The Kevin Bacon Game. (See the XML.com article "RDF: Ready for Prime Time" for more information.)

[Bookmark][Print] [Close][To Top]
  • Prev Article-XML:

  • Next Article-XML:
  • Related Materias
    Appreciating Libxslt
    Tunneling Variables
    Overriding Concerns
    New and Improved String Ha
    Understanding the node-set
    Online Magazines with Apac
    Dirty XSLT Output
    Finding the First, Last, B
    Template Languages in XSLT
    Setting and Using Variable
    Topics
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Graphic Design Tutorial
     

    Coreldraw Tutorial

      Illustrator Tutorial
      3D Graphics Articles
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial&Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial&Articles
     

    XML Style Tutorial

      AJAX Tutorial
      XML Mobile
    Flash Tutorial&Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial&Articles
     

    Linux Tutorial

      Symbian Tutorial
      MacOS Tutorial