Text to Speech vs Prerecorded Messages in IVR
There are pros and cons using TTS or prerecorded messages in Interactive Voice Response systems
Text to Speech and prerecorded messages are two possible ways to provide Voice Response - an essential part of IVR. Each of these approaches have their pros and cons.
Prerecorded messages - a traditional way of building an IVR. Operator would lay out the plan for IVR, compose a list of needed words and phrases, and either require a professional studio to record them, or do it at their own premises.
Text to Speech. While there are multiple good standalone Text to Speech applications available, the true power of TTS comes from large cloud based platforms like Amazon AWS, Google GCP and similar. With their access to very large data sets they are able to create speech synthesis which is very close to live person's speech. They offer very easy to implement and feature-rich API's which operators can get working in a very short time.
Pros of Text-to-speech:
- Quick to deploy. It is usually a matter of ready available SDK from the TTS vendor which needs to be downloaded, configured and called in few lines of code. Indeed, we have tested some of the libraries in popular programming languages and it was possible to get your first text-to-speech synthesized message within half an hour for a non-expert. in contrast, to prerecord all possible messages, it takes a lot of work.
- Takes care of compound numbers. Suppose, you need to be able to speak multi-digit amounts, like money. In case of TTS it usually works automatically, with no special configuration. If you want to use prerecorded messages, you have to take care of number composition yourself. For example, to say
$1259.99
you need to properly compose a list of sounds like:one, thousand, two, hundred, fifty, nine, dollars, ninety, nine, cents
. It requires some programming logic. And things become really complicated when you need it in different languages: in German, for example ones come before tens when you speak, so82
is pronouncedzwei and achtzig (two and eighty)
. Many other languages have their tricks too. - No need to have recordings for all possible use cases. For IVR systems where all possible words and phrases are not known in advance, TTS is the only choice.
Pros of prerecorded messages:
- No running costs. This is true if platform-API based TTS are used. They are typically billed by word or character spoken. The costs, however, can be reduced by using caching, i.e. storing locally repeated words and phrases.
- More languages. The list of languages of TTS systems are typically limited to most popular languages only. For example, Amazon Polly is available in 19 languages only (not counting variants and dialects). For Google Speech Synthesis this number is slightly higher, and with more dialects. But most of the worlds smallest languages are not covered.
- More user friendly. Synthesized sounds sometimes sound less human, and may be less attractive to the listener.
- Can speak special words like company names etc. Text to Speech may not always pronounce correctly some rarely used words, like company names, foreign-origin names etc.
- Availability. Since most of TTS are platform based, the network connection availability might be an issue.
To sum up, each of these approaches have their pros and cons. One should choose IVR system which can handle both: TTS and Prerecorded messages so each can be used in their appropriate case. have a look at our IVR Builder which supports both cases out of the box.