« Former Gold Systems building for lease | Main | Windows Mobile client for Typepad »

August 09, 2006

Microsoft Speech Recognition and Unified Messaging


This is a longer post than usual – it’s about Microsoft’s latest speech recognition demo of Vista, Exchange 2007 Unified Messaging and my experience this week as a surprise guest in Microsoft’s keynote address at SpeechTEK 2006 in New York this week.

I’ve been using Vista, Microsoft’s next operating system to be released in 2007, for about four months.  I immediately tried the built-in dictation software and was blown away by how well it worked.  Out of the box, with NO training, it performed better than anything I’d ever experienced and the editing capabilities for the first time (for me at least) made voice control of the PC intuitive and workable.

So . . . I was surprised and disappointed for my industry when I saw the video that circulated last week of the demo crashing and burning right before the eyes of the financial community.  If you haven’t seen it, I’ll spare you the pain by not linking to it, but it was clear that something was very wrong.  My Vista builds were much older and I had experienced for myself recognition that was very different from what I saw in the video.


It turns out there was a bug in the audio subsystem that was introduced at the last minute, and killed just as quickly, but it did its damage by once again making people think that speech recognition is never going to work.


Now . . . what a difference a week makes! At SpeechTEK 2006 in New York this week and I witnessed for myself the very same demo, and it worked PERFECTLY!  Microsoft even had the guts to joke about the previous failure, “taunting the demo gods” as one journalist put it, and still I expect they made a bunch of people (albeit industry people) believe that we have entered a new era for a technology that has been a long time coming.


I was not an uninterested bystander.   Richard Bray, who gave the keynote address on behalf of Microsoft and who runs their Speech Server group, invited me to demo Exchange 2007 Unified Messaging for Microsoft during his keynote address.  I’m pretty comfortable speaking to groups of people, but this was practically my entire industry and we were going to use live systems to do a live demo in real time.  No recorded demos – no net, just a telephone and a chance to either make a good impression or look like an idiot if I screwed up.  I knew the technology worked, because I’ve personally been live on Microsoft’s Exchange 2007 Unified Messaging product for about four months, but I also knew from experience that it would be easy to misspeak or have an AV problem that could hose up everything.  I had also heard that others had not had the best experience in the same room earlier in the event.


The keynote started with Rich talking about Microsoft’s investment and long history in speech recognition.  He then introduced Rob Chambers to do the Vista demo.  I admire Rob – he looked really cool and confident as he walked up to the stage. The dictation recognition was perfect and it understood everything Rob said.  He showed how easy it is to change and edit a document and then moved on to controlling the PC with just his voice.  He received several rounds of applause, especially when he changed his wallpaper from the standard Vista wallpaper to a photo of his young son without ever touching the keyboard.  (I’ll bet you’d have to think about how to do it even with the keyboard, which is what was cool about that part of the demo.  He said something like “How do I change the wallpaper?” and Vista walked him through it, all with only his voice.  You’ve got to see it to believe it.  I hope when THIS video makes the rounds that it is half as popular.) When he finished his demo of Vista speech recognition, my first thought was to high-five him for doing such a great demo.


My second thought as Rob walked off the stage and Rich began to introduce me, was “Holly crap, I’m up and if I screw up, I’M going to be the guy in next week’s video making the rounds.”  Rich put me at ease by surprising me by starting this part of the presentation with a photo of my FJ Cruiser project and asking if I really had installed an Xbox 360 in it.  I said something like “I’m not sure which is more embarrassing, that I have installed an Xbox in the FJ or that I’ve had to admit that I’m from Boulder, Colorado and I own an SUV.”


I then jumped into the demo and showed how in addition to email; I can now access my voice mail and faxes via Outlook and Exchange Server 2007 with Unified Messaging.  For the demo, I used Outlook Web Access which allows me to access email via a web browser.  We listened to a voice mail from Clint Patterson (another jokester) who suggested that “I needed more cow bell” in the demo.

By the way, this was a demo, but it was my real inbox and our live Exchange Server 2007 back in Boulder.  No one behind a curtain and nothing faked and it will be delivered as part of Exchange 2007.  The idea of Unified Messaging has been around for years but it has typically been an integration of a legacy voicemail system, an email server and an add-on to an email client.  With Microsoft's approach, Exchange 2007 IS your voice mail and while pricing has not been publicly announced, the phrase “Radical Economics” has been tossed around by the analysts.  It means that I now have only one place to go for my office voice mail, my cell phone voice mail, my email, my faxes (I do still get a few faxes and they are usually something I don’t want sitting out on a public fax machine) AND I have only one login.  My IT people love it because they don’t have to manage separate systems and separate directories.  I know some of our customers are justifying the upgrade to Exchange 12 with the savings in maintenance charges on their legacy voice mail systems.


Back to the demo - After listening to the voice mail, I walked across the stage to a plain old telephone and dialed my number at Gold Systems.  I logged into Outlook Voice Access and was given the options of listening to my voicemail, listening to my email (and I can respond by voice to anyone but Brad, who hates voicemail), calling anyone at Gold Systems or (I really like this) anyone in my personal contacts.  Finally, I can do some very interesting things with my calendar which is what I showed next.


I said “Calendar” into the receiver and the system replied with something like “You have a meeting in progress entitled SpeechTEK Keynote address with Richard Bray from 8:30 AM to 10:00 AM.”  It offered some options but I interrupted and said “Next.”  It said, “Your next meeting is entitled “Breakfast with Clint Patterson” and again I interrupted and said “Cancel the meeting.”  I was asked to confirm that I really wanted to ditch Clint for breakfast, which I did, but I accepted the offer to send a voice message along with the cancellation notice.”  I responded “Clint, I have no idea what More Cowbell means and I told you they wouldn’t get the joke.  I’m going to have to skip breakfast as I’m still on stage here with Rich.”  I pressed a button on the phone to indicate I was done talking (maybe I could have just stopped talking?  I don’t know, I haven’t tried it) and I said “Send it with Priority”.  The meeting was canceled, Clint was sent an email with my voice note and my calendar in Outlook was updated.


For the next meeting on my schedule, I interrupted the system again and said into the phone, “I’ll be 30 minutes late”.  I know a LOT of people who could use this feature!  (I’m fine with you being late occasionally but let me know, OK?  Now you have no excuse if you are on Exchange 12 with UM)  This time the system sent out meeting notifications saying that I was running 30 minutes late.  I hung up the phone, showed everyone how my calendar had updated automatically, made a few last points and I was done.  Whew!  I was the only non-Microsoftee on stage and I was grateful for the chance to be a part of the keynote.  The other demos went perfectly too and for the rest of the show I felt elated to have been a part of it all.


The big announcement at the keynote is that Microsoft is merging what has been known as Speech Server 2007 into what was formally known as Live Communications Server to create the Microsoft Office Communications Server 2007.  I’ll write about THAT in another post, but it’s big news and is going to create a lot of opportunity in the industry.


I know I sound like I’m drunk on Microsoft Kool-Aid, and I am a little tipsy from it, but this really is big news and I think it will be good for the industry.

To everyone but my competitors, even most of Microsoft’s competitors, this is going to be good for business because it is going to extend speech recognition throughout the enterprise.  The world of communications in general is going to grow and change in fundamental ways, and a lot of people will benefit from Microsoft’s massive investment in this world. 

To Gold Systems competitors specifically:  Pay no attention to this and keep doing what you are doing. No one is ever going to trust their voice mail, phone calls or important business to a PC.  After all, when was the last time you rebooted your mainframe?  Just keep repeating that and maybe this will all go away.  But I doubt it.



August 9, 2006 in Car Computer, Speech Recognition | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Microsoft Speech Recognition and Unified Messaging:

» More speech demos from Richard Sprague WebLog
Terry Gold has more details about the Microsoft keynote. He did a live demo right after Rob's and... [Read More]

Tracked on Aug 10, 2006 9:22:10 AM

» Speech in Office Communications Server - the word on the street from Working the Spoken Word
Now that the dust has settled on the announcement to integrate speech services into Office Communications... [Read More]

Tracked on Aug 18, 2006 6:40:24 PM

» Speech in Office Communications Server - the word on the street from Working the Spoken Word
Now that the dust has settled on the announcement to integrate speech services into Office Communications... [Read More]

Tracked on Aug 19, 2006 10:30:04 PM

» Speech in Office Communications Server - the word on the street from Working the Spoken Word
Now that the dust has settled on the announcement to integrate speech services into Office Communications [Read More]

Tracked on Dec 13, 2006 5:43:14 PM


Hi Terry,

I had heard how impressive the demos were but I didn't realize that you were doing one.

You aren't the only one drunk on the Kool-Aid. I think my blog Gotspeech was the first to break the news and things have been buzzing since.

I'm going to have to come work for you so maybe I can attend events like SpeechTek.

Posted by: Marshall Harrison "the gotspeech guy" | Aug 10, 2006 5:23:32 AM

Any idea how Microsoft's new speech recognition product compares to Dragon NaturallySpeaking version 9? I just installed version 9 this morning and I'm quite happy with it -- wondering how it stacks up against Microsoft's solution.

Posted by: Elliot | Aug 10, 2006 11:22:04 AM

Terry, this is fascinating, even to someone like me who is not voice-recognition-software savvy. I've worked in the telecom industry for many years, recently left the Cube to try life in what feels like real-time, but I still pay attention to what's happening in the techie world. I was told years ago that GoldSys was a company to watch and an excellent company to work for. This post just confirms that.

Your blogs are a pleasure to read, even when the info is technical. If your software is as user-friendly as your blog, your customers will adore you.

Posted by: Verna Wilder | Aug 11, 2006 3:04:07 PM

Marshall - congratulations on gotspeech.com - you are really helping the community. We should lobby to get you on the agenda for the next SpeechTEK as a speaker, and then you'll have to go!

Vera - thank you, you are very kind!

Elliot - I have not tried Dragon Version 9, but I'd love to hear what you think after you've used it awhile.

Posted by: Terry Gold | Aug 11, 2006 5:00:53 PM

I give demonstrations of Dragon NaturallySpeaking to individuals and groups here in the UK. To my mind, giving a demonstration is the best way to "sell" speech recognition and its benefits. They work best when you ask someone in the audience to choose from a Newspaper article or one of their memos rather than from your own script.

The bottom line is that you do take a chance when doing a live demonstration. I have found that I like to get to the venue early to check out all the sound equipment and I ideally aim to run the audio checks when the room is filled up with people.

Does anyone else have any other tips on how to make speech recognition demos run smoothly?


Posted by: Peter Maddern | Dec 23, 2006 5:05:35 AM