Adding SALT to HTML
by Simon Tang
May 14, 2003
Wireless applications are limited by their small
device screens and cumbersome input methods. Consequently, many users
are frustrated in their attempts to use these devices. Speech can help
overcome these problems. It is the most natural way for humans to
communicate. Speech technologies enable us to communicate with
applications by using our voice. However, listening is slower than
reading and callers have to remember all the information presented to
them. Since our short-term memory is only capable of handling about 7
chunks of information, speech applications must be carefully
designed.
Both wireless and speech applications have their benefits but also
their limitations. Multimodal technologies attempt to leverage their
respective strengths while mitigating their weaknesses. Using
multimodal technologies, users can interact with applications in a
variety of ways. They can provide input through speech, keyboard,
keypad, touch-screen or mouse and receive output in the form of audio,
video, text, or graphics.
The SALT Forum
The SALT forum is a group
of vendors which is creating multimodal specifications. It was formed
in 2001 by Cisco, Comverse, Intel, Microsoft, Philips and
SpeechWorks. They created the first version of the Speech Application
Language Tags (SALT) specification
as a standard for developing multimodal applications. In July 2002, the
SALT specification was contributed to the W3C's Multimodal Interaction
Activity (MMI) . W3C MMI has published a number of related drafts, which
are available for public
review.
Objectives of SALT
The main objective of SALT is to create a royalty-free,
platform-independent standard for creating multimodal applications. A
whitepaper published by SALT Forum further defines six design
principles of SALT.
- Clean integration of speech with web pages
There is a lot of knowledge, skill, and investment in the existing
web-based infrastructure. SALT relies on this investment by specifying
a small set of XML elements to add speech capabilities to existing
markup languages.
- Separation of the speech interface from business logic
and data
SALT does not alter the processing logic of
the existing markup languages. It defines an independent set of
elements that can be used cohesively with the existing
technology.
- Power and flexibility of programming model
DOM events and scripting are used to integrate SALT with
existing pages. The scripting programming model provides the
flexibility to add speech processing logic.
- Reuse existing standards for grammar, speech output, and
semantic results
Instead of reinventing the
wheel of existing technologies, SALT reuses many of the
existing standards.
- Support a range of devices
One of the main
objectives of SALT is the ability to extend many of the existing
markup languages such as HTML, XHTML, cHTML, and WML. It is not
restricted to any particular type of devices.
- Minimal cost of authoring across modes and
devices
The first five principles above result in
minimizing the cost of developing, deploying and executing SALT
applications.
A number of vendors, including HeyAnita, Intervoice, MayWeHelp.com,
Microsoft, Philips, SandCherry and Kirusa, SpeechWorks, and VoiceWeb
Solutions, have announce products, tools, and platforms that support
SALT. There is also an open source project, OpenSALT, in the works to
develop a SALT 1.0 compliant browser. Detailed information can be found
at the SALT
Forum's implementation page.
Microsoft .NET Speech SDK
Before diving into experimenting with HTML and SALT, we need to
set up the appropriate development environment. I am going to use
Microsoft's .NET Speech SDK 1.0. The SDK Beta 2 was released on October
30, 2002. It consists of the following components (a detailed
description can be found in the Microsoft .NET Speech SDK and Platform
Overview whitepaper):
- Developer tools (for Visual Studio .NET) - Grammar Editor, Prompt
Editor, ASP.NET Speech Control Editor and the Speech Debugging
Console.
- ASP .Net Speech Controls (for Visual Studio .NET)
- Samples SALT applications
- Documentation and tutorial on building SALT applications
- Client add-on for Internet Explorer and Pocket IE,
which can be used to run speech-enable
web-pages.
The SDK can be downloaded or ordered by mail from the Microsoft
Speech Technology site. You
should make sure that you have meet the following requirements before
beginning the installation.
- Windows 2000 [Server] SP3, or Windows XP Pro SP1
- Internet Information Server (IIS)
- Internet Explorer 6.0 or later
- .NET Framework 1.0 SP2 (Have to install .Net Framework first)
- Visual Studio .Net (optional - if using the development tools)
Windows XP Home edition is not supported because IIS is not
available. You will also need to have .NET Framework 1.0 and the SP2
installed one after the other, separately. They can be downloaded from
Microsoft .NET Framework site. Make
sure you do not install .NET Framework 1.1 Beta, as the .NET Speech SDK
1.0 will not work with this.
If you do not have Visual Studio .NET installed, or if you are not
planning to use the developer tools, you will need to disable the Visual
Studio .NET Speech Tools through the Custom Setup option.
|
| Figure 1. .NET Speech SDK Installation |
Once the installation is completed, you will find Microsoft .NET Speech
SDK Beta 2 and Microsoft Internet Explorer Speech Add-in in your Programs
menu.
The installation was not without problems. After the installation
completed, I ran into an error with the Text-to-speech Engine (TTS). It
returned error code of "-3" and gave the reason of "Internal SAPI/Prompt
Engine error". After plowing through the documentation, I came across a
resolution in the SDK's readme file. All I had to do was to change the
default voice to one that comes from Microsoft. There are number of other
"Known Issues" listed in the documentation which you should familiarize
yourself with.
Adding Speech to HTML
I am going to show how we can SALT-enable a simple HTML application by
hand. The best place to start is by looking at some simple HTML code.
I created a directory called salt in the default document
root directory, c:\Inetpub\wwwroot\salt\ and placed the
following HTML file there:
1. <html>
2. <head>
3. <title>My First HTML Application</title>
4. </head>
5. <body>
6. <h3>This is my first HTML application!</h3>
7. </body>
8. </html>
Unsurprisingly, this yields the following page:
|
| Figure 2. Simple HTML page |
Now, let's add a SALT element to it. We want it to speak the sentence
back to us through text-to-speech (TTS). We will use
<prompt>, one of the top-level elements of SALT.
1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">
2. <head>
3. <title>My First Multimodal Application</title>
4. </head>
5. <body onload="RunIt()">
6. <h3>This is my first Multimodal application!</h3>
7. <salt:prompt id="first">
8. This is my first Multimodal application!
9. </salt:prompt>
10. </body>
11. <script language="javascript">
12. function RunIt() {
13. first.Start();
14. }
15. </script>
16. </html>
In line 1, we added the SALT namespace. Lines 7-9 contain the
<prompt> element. It can be used for speech synthesis
or to playback a recorded audio file. The attribute
id="first" gives us a reference to the
<prompt> element which we use in the JavaScript.
SALT relies on a scripting language to tie together events and logic
between its elements and HTML elements. In our case the function
RunIt() is invoked when the page is loaded. All it does is to
execute the prompt and play the sentence "This is my first Multimodal
application!" through the text-to-speech engine. So far, so good. When I
tried to run the page, however, I did not hear anything. Instead I got
the following:
|
| Figure 3. Unexpected result from HTML + SALT page |
Clicking on IE's warning icon was no help. It turns out that I need to
explicitly enable the speech add-on for IE, otherwise, it will ignore all
the SALT elements. All I needed to do was to add two lines (lines 2 and
3):
1. <html xmlns:salt="http://www.saltforum.org/2002/SALT">
2. <object id="k-tags"
CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC"
VIEWASTEXT></object>
3. <?import namespace="salt"
implementation="#k-tags"/>
4. <head>
5. <title>My First Multimodal Application</title>
6. </head>
7. <body onload="RunIt()">
8. <h3>This is my first Multimodal application!</h3>
9. <salt:prompt id="first">
10. This is my first Multimodal application!
11. </salt:prompt>
12. </body>
13. <script language="javascript">
14. function RunIt() {
15. first.Start();
16. }
17. </script>
18. </html>
Now, running the application again, you should get the desired
behavior. The text is displayed and spoken.
If you prefer to use recorded audio file instead of the mechanical
TTS voice, you just need to replace lines 9-11 with:
<salt:prompt id="first">
<salt:content href="hello.wav"/>
</salt:prompt>
The <content> element specifies the URL of the audio
file.
Summary
In this article I introduced multimodal XML technology and specifically
SALT. Using Microsoft's .NET Speech SDK, you should now be able to add
SALT elements to HTML web pages. Good luck with your further
investigations with SALT.