The OSNN PDC '05 Blog   Subscribe to RSS feed

The Microsoft Professional Developers Conference.
September 13-16, 2005 :: Los Angeles Convention Center, CA.

Saturday, September 17, 2005

Speech recognition and synthesis in Windows Vista


One of the talks I attended yesterday was about speech recognition and synthesis in Windows Vista and how developers can leverage the System.Speech APIs to make their own applications "speech-enabled." The session was presented by Robert Brown (whom we were first introduced to in an old Channel9 video), Philipp Schmid and Steve Chang.

In Robert's first demo, he showed us how it's now possible to control the entire Windows UI using just speech. He was able to change his desktop wallpaper, open applications and dictate text to them, and even navigate through a MSN Virtual Earth map without touching his keyboard or mouse. It was clear that the speech recognition engine in Vista is already miles ahead of what it currently is in XP, and there's still almost a year for them to improve upon it.

He then gave us a preview of the new speech synthesis engine in Vista. Microsoft Sam and Mary, the two robotic voices in XP, have been replaced by Anna, who sounds much more natural. Even better, Vista will ship with recognition and synthesis support for 8 different languages. Robert showed us how the system reads Chinese text. Lili, the Chinese voice, doesn't sound anything like the English voice and is much more suited for reading Chinese text, for example.

Philipp Schmid then took the stage and showed us how ISV's can enable speech recognition and synthesis in their own applications. Microsoft's goal with Vista is to take speech mainstream, and in order to facilitate this, the System.Speech API's are both easy to use and very powerful. For example, in order to enable speech recognition in an existing application, all one has to do is create an instance of SpeechRecognizer and Grammar, load the Grammar instance into the recognizer, and subscribe to the SpeechRecognized event. The grammar simply consists of a finite state machine that goes through the different states to build up a sentence or command.

Finally, Steve Chang, who manages the Microsoft Speech Server team, demonstrated how applications can become even more ubiquitous by making them accessible through any telephone line. The first demo app was a simple one that allowed users to dial in and book concert tickets. The second one was more interesting and was developed by a team of SDETs at Microsoft. It allows users to dial in, let the system know where they're leaving from, where they want to go, and at what time, and the system responds by giving them the time that the next shuttle is scheduled to arrive. However, it goes one step further and even calls the user back five minutes before the shuttle arrives. Speech Server, like the speech engine in Vista, is also multilingual, and to demonstrate that, Steve interacted with the system in French. :)

Finally, he explained the concept of "mixed initiative," which allows speech recognition systems to be more natural. In most current applications, the system prompts you for something, and you respond and this cycle continues until the system has asked you for all the information that it needs. This becomes tedious after a while. Wouldn't it be nice if you could, in one go, tell the system everything it needed to know, and it could intelligently break up what you said into multiple pieces and do its job? That's what "mixed initiative" is all about - the user and system jointly control the dialog flow. As an example, Steve called the shuttle service app and, in one go told the system where he was leaving from, where he wanted to go and at what time. The system then broke down his command into pieces, recognized the source, destination and time separately and replied with the time the next shuttle was going to arrive. :)

Perfecting speech recognition engines is an incredibly difficult problem to solve, and it was great to see how much progress has been made since the release of XP a few years ago. The presentation was pretty fascinating, and I'm now curious to see how Build 5219 responds to my voice!

Tags:

4 Comments:

  • At 4:04 PM, Anonymous Anonymous said…

    Was wondering if you knew how to start the speech recognition in vista build 5308. I was able to go into the control panel and train it but I cant get it to start as I do not know how.

     
  • At 6:14 PM, Blogger Brian said…

    Hello, First off, I am HANDICAPED, Traumatically Brain Injured! yet, even tha. Club Shepherd yyyy! Will NOT Stop me from designing the BEST World Wide Web browser Version, (always need to keep up-to-date with NEW Features). But I Think my VERSION IS an END ALL, BECAUSE IT IS A MASTERPIECE! Club Shepherd yyyy. IS An END ALL Version, because it has a Feature which can be aded fto it, whenever a BUG OR Mistake has been found! The software itself, automatically produces a Fix, NEW Version! The Program, once it receives an e-mail to the Fix.com, e-mail, not yet derived, Because YOU, Do not have an AUTOMATIC Fix coded, YET! 44, new feature presentations, I wish to add to the Internet! Sounds, futuristic, to me, but, when I being Struck by a Railroad TRAIN, Lived, God Sure does perform certsin graces, in MYSTERIOUS Ways! Thx Father God! I Still am in a quadary. God, exists, I realize that FACT, But He sure does allow strange accidents! I Must ask Him that in heaven!

    But, anyway, Back to the topic at hand! This version of Explorer is FAR TOO Weak, I found 14 MAJOR Bugs, in the Code 27 features, well 2 my Brother, Greg Shepherd disovered, thx Bro! So I HAVE THEE ULTIMATE Browser edition wghich needxs to be Implemented and Coded! 4-17-2003, I discovered critical coding ERRORS, 27 NEW Feature presentations, I added to this WEAK Version, I should be Called or e-mailed A Final Fix you are producing!! I Truly, Don't Believe you find it FLAWLESSLY Coded, Either, just don't want 2. WORK, WELL, WAH, I discovered the Major and Minor Coding ERRORS, Of Explorer, 7.00! See, I have the Latest Version, this is why I Produced the Gran' Finale Version! namerd, Club Shepherd yyyy. I feel it is IMPOSSIBLE 2 BEST This Version, because I found ALL THE Minor Bugs, AND MAJOR Flaws!

    Brian Shepherd
    513-367-6048
    cincy2hot4u@hotmail.com
    +++Bridge_SMASH
    Club Shepherd yyyy
    10490 Sugardale, Harrison, OHio 45030

    God Bless YOU, Even, IF You are Dumbocrat, j/k!

     
  • At 11:24 AM, Blogger +++Bridge_SMASH said…

    I'm Sorry, Microsoft TOLD ME It will be designed, When, I havn't the Slightest!

     
  • At 11:33 AM, Blogger +++Bridge_SMASH said…

    Why, can't I Receive, HELP??

    Brian Shepherd
    VERY Frustrated!
    513-367-6048
    cincy2hot4u@hotmail.com
    +++Bridge_SMASH
    Club Shepherd yyyy
    10490 Sugardale, Harrison, OH 45030

     

Post a Comment

<< Home