Richard Bloor
Monday, 26 January 2004
The Sendo X brings speaker independent dialing to a Symbian OS smartphone using technology from Voice Signal. In this article SymbianOne profiles Voice Signal and looks at where its technology is taking smartphone voice control.Voice Signal has achieved two milestones with the implementation of its speaker independent voice dialing solution in the Sendo X, it's the first Symbian OS implementation of its technology and the first implementation in a phone for the European market. So what brought Voice Signal to the Symbian OS?
Based in Boston Massachusetts, with around 50 staff, Voice Signal was formed in 1995 and initially supplied speaker independent voice recognition solutions for all embedded system applications. In the late 1990's Voice Signal started adapting their solutions for the mobile market and by the beginning of 2000 were seriously focusing on mobile applications.
This focus has paid off, as according to Chris Reiner Voice Signal's Vice President of Business Development, Voice Signal is the only company to have commercial implementations of speaker independent speech recognition, in mobile devices, on the market.
Speech recognition has been around in various forms in PC or server applications for some time and in wireless devices since the late 1990's. However current wireless voice recognition solutions are speaker dependent. The phone's owner needs to train the contacts they want to call and are typically limited to around 20 names. "The added step of having to train voice tags almost certainly eliminates your average consumer," said Chris. "And when you have a phone book such as the one on a Symbian OS phone that is effectively unlimited in capacity, traditional speaker dependent dialing really becomes an unusable feature."
Voice Signals solution is a phonetic based system, it requires no training and is able to match speech to any of the contacts in the phone's contact database, regardless of the number of contacts stored. The technology has already been successfully incorporated into a number of phones from Samsung and Motorola for the US and Asian markets.
Speech recognition technology generally uses one of three techniques, Hidden Markov Models (HMM), Dynamic Time Warping (DTW) and Neural Networks. The Voice Signal solution uses HMM, a statistical method that essentially tries to predict whether two sequences match. It is used in several disciplines, with applications as diverse as determining the likelihood of an international crisis resulting in conflict, to work on the recognition of facial expressions.
HMM works by sampling speech in blocks of approximately 10 milli seconds, each sample is then characterised by the frequencies it contains. The pattern of frequencies is then matched with a database of phonemes, the basic sounds which make up speech, to determine which one the sound is most likely to be. Once the phonemes have been matched the list of phonemes can be compared to the phoneme patterns for various words to determine which ones were spoken.
While the Voice Signal solution is essentially similar to the HMM implementations you would find on a PC or server based solution Chris says "it is algorithmically state-of-the-art, and what we at Voice Signal have been able to do is to work out how to make the engine run smaller and faster."
It was the ability to provide this speaker independent recognition which Chris believes really excited Sendo when discussion started between the two companies. When the Sendo X project emerged both companies agreed that it was the best vehicle for Sendo to bring Voice Signal's technology to market.
The solution implemented on Sendo X consists of three components, the UI, the phonetic speech recognition engine and a text to speech (TTS) engine which repeats the recognized name back. "Since the user is no longer training the device we realized that they would need some form of audible feedback to confirm they had the right contact," commented Chris.
Voice Signals recognition and TTS engines were both built in C and were designed for portability so the Symbian OS implementation for Sendo X did not mean Voice Signal were developing from scratch. In fact Chris notes that Voice Signal's engineer found the Symbian OS very straight forward to work with. "It is a great environment, gave us a quick time to market, has good tools and a very robust development support network in place," said Chris. From a commercial perspective Chris characterized the Symbian family as nurturing and very welcoming of third party innovation.
"We are real excited about the Symbian platform and real excited about the Sendo X and believe both are a real opportunity for us," says Chris. While Chris would not reveal the nature of Voice Signals future roadmap for Symbian based products he did comment that "As a company we have a growing support for the Symbian OS and I think you are going to see a number of product announcements from us over the coming months."
Third Party developers may be interested to know that part of Voice Signals long term vision is to open their technology so it can be utilized in third party applications. However Chris realizes that before this is a practical proposition Voice Signal has to establish a broad presence in the market and he sees at least another 12 to 24 months before a critical mass of devices using Voice Signal technology are available. An equally practical issue is that Voice Signal will need a new structure to support third party developers. "In the near term we are looking to extend the range of tasks that our technology can help the user with, tasks other than dialing," said Chris. "So for example we are investigating ways to make it easier to send, receive and create messages, to access data services, just about every task where the user interface is a constraint. Essentially we are looking to leverage our technology into a multi modal user experience and that is our main mission right now."
You can find out more about Voice Signal at its web site, www.voicesignal.com and the Sendo X at Sendo's web site, www.sendo.com.
|