« Mousedriver Chronicles and day jobs | Main | It's a great day to be an entrepreneur »

February 02, 2005

Speech Recognition - who gets it?

I'm working on a white paper about speech recognition and I'd like to try out a few ideas.  The tagline for this blog is Entrepreneurship, Speech Recognition, VoIP, Wireless and . . . and so far I've been pretty much stuck on entrepreneurship.

Actually when Jim and I started Gold Systems, speech recognition was something that you only saw working on an episode of Star Trek.  We build telephone self service applications for large enterprises and most of them are speech recognition enabled now.  Our very first paying customer in 1991 was a bank in Fargo North Dakota and we built an automated banking system for them.  Yes - one of those touch-tone systems that you use when you can't talk to a real person.  Or at least that is how a lot of people view telephone self service, whether it is touch-tone or one of the newer ones that use speech recognition (we did our first speech rec application in 1995).  And yet I remember this bank being pleasantly surprised that customer satisfaction actually rose after the system went into service.  In fact the job satisfaction for the real people answering the phone also rose.

How can that be?  This was a bank whose President had declared that they would never have voice mail.  First of all this was a small regional bank when we started working with them.  They didn't have a 24 hour follow-the-sun call center.  When the agents went home, the callers got a message telling them to call again tomorrow.  It was a big improvement just to allow customers to get their account information 24 hours a day.  But a strange thing happened - people who had been talking to the agents for years started using the automated system even during business hours.  For one thing they never had to wait for the automated system, but also a lot of people apparently didn't want to discuss their finances with a real person.

If you aren't in the business you may not realize that an awful lot of people manage their finances by calling to check their account balance every day.  You might also be surprised at how many people still use the phone as their primary communications device.  If you are reading this - you are not like most people - and you may wonder why they don't just check their balance on the web.  Back then the web didn't exist, but still MOST people - not necessarily those that read and write blogs - but most people still find their telephone to be a great way to get information.  For the people who really did need to talk to a real human being, they weren’t stuck in queue behind a bunch of people who were just checking their checking account balance for the third time in one day.

Everybody loves to hate touch-tone systems that make it difficult to get the information they need while making it impossible to speak to a real person.  Guess what – I hate them too, because they don’t have to be that way.  Today most of our business is in developing speech recognition applications, but some companies are still deploying touch-tone applications.  What really maters is whether the application is designed so that normal people, who are eating a burrito, driving their car and they

JUST WANT TO GET THEIR BANK BALANCE – whether they can get it easily and quickly and get back to eating that burrito.  I was actually driving back from a meeting today, eating a burrito and even I wouldn’t try to log onto the web with my PocketPC while driving and eating.  (One or the other, but not both)

From the very beginning Jim and I stressed to the developers that a great application, one that people will love to use, is first and foremost designed to get the job done and get out of the way.  Developers, and I was one, love to think up new features.  Do you know why some bad touch-tone applications have ten options on the main menu?  Because there aren’t more buttons on the telephone.  Some developers would put twenty options on the menu if they could get away with it.  The first key to a great application whether it is touch-tone, speech recognition or web is to do a great job on the human factors design.  It may be massively complex behind the interface but the part of the application that the user deals with must be simple, natural and easy to use.  And if the caller DOES want to talk to a real human, LET THEM!  Who doesn’t know how to ultimately get to a human – the only question is how mad the caller is going to be by the time they get there, so our philosophy is to make the caller want to use the self service option, but let them opt out if that is what they want.

With the coming-of-age of speech recognition we're being handed a double edge sword.  Now the human factors work that we’ve always done with touch-tone is even more important.  One of the keys to getting good recognition performance is to ask questions that generate consistent responses.  If you confuse the caller with a complex question – and remember they are driving or whatever and not always paying close attention – they will answer something like “ah, uh oh let’s see, I think uh, yes my account number is uh CLICK.”  It is going to be a long time before speech recognition engines can get much out of that sort of response.

I’ll close this post with this thought.  If it were easy there wouldn’t be so many bad systems in the world, so please don’t try this at home just because it sounds fun to make a computer talk and you have a few spare developers walking around.  Leave it to the professionals and put your efforts into filling all the orders that your happy customers will give you when they discover that you’ve made it easier than ever to do business with your company over the phone.

February 2, 2005 in Speech Recognition | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Speech Recognition - who gets it?:


What are your thoughts on using VOIP to do dictation (as opposed to speaker independent, command based) recognition on the server side for an intranet deployed application? Has this been done with any success? Also, any idea on vendors with Java based server side solutions for this? Thanks,


Posted by: Ravi | Feb 7, 2005 10:46:53 PM

I am not sure I entirely understand your question...as I see it, you are comparing apples and oranges (i.e., two separate, independent pieces of the technology chain).

VOIP is the "pipeline" over which the communications travel. "Speaker independent, command based reco on the server side" is the speech reco piece of the chain. The two are independent decisions.

In other words, you can use whatever pipeline you want (VOIP, land-line, whatever) as long as the transmission is clear enough so as not to interfere w/ the speech reco engine.

"Command-based reco" is commonly used in call center applications, in which the list of options is fairly limited. For example "Please touch or say 1 for Sales, 2 for customer service, etc. Reco accuracy rates can be very high (95%+) for command-based reco because the dialog is limited. [Speech reco is all about statistics - i.e., using math to predict the probability of recognition based on speech models.]

The alternative to command-based reco is "natural language" reco, in which the user can speak freely and the reco engine will decipher what is being spoken. (Or, when more clarification is necessary, can intelligently submit follow-up prompts to further narrow the choices.) The last time I checked, Sears uses a natural language app for their Cust Support line. I said "the fan on my microwave oven is broken", and it guided me to the proper CRM resource.

You need to select the optimal type of reco for your application. Usually, it is a tradeoff: you don't want to bore the caller with overly-simplistic menu choices that force them to anser a bunch of questions, because they will just press "0" to get to a phone rep (which costs WAY more than servicing the user automatically). On the other hand, you don't want to design your app so broadly that the reco engine cannot figure out what the user is saying, and must keep asking further questions to clarify the dialog.

Hope this was helpful.

Posted by: Mark | Jul 24, 2005 10:28:02 PM

Post a comment