Hello, Voice World
by Didier Martin
September 06, 2000
In our last trip to Didier's Lab, we encountered the aural world of
XML made possible by the VoiceXML language. This week I'll explain
more about VoiceXML and create the classic "Hello World"
application. But this time instead of seeing the result, you'll listen
to it. People intrigued by the last article asked me if and how
VoiceXML documents are used to build voice applications. Answering
this question presents an opportunity to highlight VoiceXML's
features, and the way its basic concepts make it very different from
HTML or XHTML.
A VoiceXML application is a collection of dialogs. A dialog is the
basic interaction unit between the VoiceXML interpreter and an
interlocutor. A dialog unit can either be a form or a
menu. A form consists of a collection of fields which are
filled by the interlocutor. A menu is a choice made by an
interlocutor. The figure below shows an example VoiceXML application
with the links between the various dialogs shown.

Figure 1: VoiceXML dialog collection
Hello World
Here is the classic "Hello World" application in VoiceXML:
<?xml version="1.0"?>
<!DOCTYPE vxml PUBLIC "-//Tellme Networks//Voice Markup Language 1.0//EN"
"http://resources.tellme.com/toolbox/vxml-tellme.dtd">
<vxml version="1.0" base="" lang="en" application="">
<meta name="Author" content="Didier PH Martin"/>
<meta name="Document" content="The classical Hello World Sample"/>
<form>
<block>
<audio src="http://talva.dyndns.org/vxml/helloWorld.wav">
Hello world
</audio>
</block>
</form>
</vxml>
Since we are dealing with a talking machine, our "Hello World"
application has nothing to show for itself: but it definitely has
something to say.
The first line should be familiar. It's a DOCTYPE
declaration indicating where the document type definition file is
located. Normally, if validation is unnecessary, or if external
entities are not required, the DOCTYPE declaration can be
omitted. But if you're testing this "Hello World" application within
the Tellme environment, you'll need to include the Tellme DOCTYPE
declaration since its implementation is slightly different than the
one recommended by the VoiceXML consortium. The DOCTYPE declaration
is mandatory for the Tellme environment but not necessarily mandatory for
other VoiceXML interpreters.
The root element (or the document type element),
<vxml>, contains version, base, language, and
application attributes. The most important of these is the application
attribute. It represents a major point of difference between XHTML and
VoiceXML applications. In the XHTML world, the contents of the
<html> element are rendered, in most current
browsers, as an independent scrollable page. In the VoiceXML world,
the contents of the <vxml> element are integrated
into a larger whole: an application session. Session duration is
simply the duration of the user's connection; that is, the time the
interlocutor is connected to the VoiceXML interpreter. A session ends
when the interlocutor hangs up, or when a VoiceXML document asks the
interpreter to hang up.
A VoiceXML application is a set of documents sharing a common
application document. The application attribute in VoiceXML documents
indicates to the interpreter its ownership by a particular
application. Our sample document is part of the Tellme
application that defines such standard behaviors as what to do when
the interlocutor says "Tellme menu", or what to do when the
* key is pressed twice, or when the interlocutor says
"Goodbye". The following diagram shows the relationship between the
application and dialog documents.

Figure 2: Hierarchy of VoiceXML Documents
The <meta> elements in our VoiceXML document
mean basically the same thing as in HTML: they provide information
about this document for use by a classification engine. We could have
included <rdf> elements for the same purpose, but
only the <meta> element is accepted as a valid
element by the VoiceXML DTD.
Moving further into the document, note that even if we do not
require fields to be filled by the user, we still use the
<form> element to enclose the
<block> element. Thus, the
<form> element allows the user to input into
fields, or it causes the interpreter to say something. My recent
article, Adapting
Content for VoiceXML, contains a sample VoiceXML form for user
input.
A <block> contains executable elements. Just
think of it as a "block" of instructions to be processed by the
VoiceXML interpreter. Within <block>, the
<audio> element is specific to the Tellme engine. A
fully compliant VoiceXML document would use the
<prompt>Hello World</prompt>
construct instead.
So if you test the "Hello World" application in the Tellme
environment, you must use the <audio> element. But
if you are using the IBM VoiceXML environment (available as a free
download), replace the <audio> element with the
<prompt> element as recommended by the VoiceXML
consortium.
In fact, the <audio> element is a valid
element in the VoiceXML v1.0 specification document, but it's used to
refer to a pre-recorded audio stream. Thus, the rendering of a
pre-recorded "Hello World" in the VoiceXML 1.0 specification would
look like
<prompt>
<audio src="http://talva.dyndns.org/vxml/helloWorld.wav"/>
</prompt>
For the Tellme engine, the same expression would be
<audio src="http://talva.dyndns.org/vxml/helloWorld.wav">
Hello world
</audio>
If the Tellme engine doesn't find the audio file, then the data
contained in the audio element is converted into voice. If the Tellme
engine does find the relevant WAV, it's downloaded, cached, and played.
A pre-recorded voice obviously offers better audio quality than
synthesized voice. It's better, then, for any static audio content to
refer to a pre-recorded audio file in addition to text, which in this
case functions as a fail-safe rendering if something goes wrong with
the audio file, as well as for documentation purposes.
Homework
Download the alphaWorks voiceXML interpreter, or use the Tellme
studio, and test your own version of the "Hello World" application.
Resources
IBM VoiceXML interpreter: This tool is freely available from the
IBM alphaWorks
site.
You can also register with the Tellme studio, which is freely
available until October 31 2000, at http://studio.tellme.com.
The VoiceXML version 1.0 specification is available either from
the VoiceXML
Consortium or the W3C
Consortium.