IBM Cloud Docs
Using a voice for speech synthesis

Using a voice for speech synthesis

Both the HTTP POST and GET /v1/synthesize methods, as well as the WebSocket /v1/synthesize method, accept an optional voice query parameter. You use the voice parameter to indicate the voice and language that are to be used for speech synthesis. The service bases its understanding of the language for the input text on the language of the specified voice.

Be sure to specify a voice that matches the language of the input text. For example, if you specify the French voice fr-FR_ReneeV3Voice, the service expects to receive input text that is written in French. If you pass text that is not written in the language of the voice (for example, English text for the French voice), the service might not produce meaningful results.

Specify a voice examples

The following example HTTP POST request uses the voice en-US_AllisonV3Voice for speech synthesis:

IBM Cloud

curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: application/json" \
--header "Accept: audio/wav" \
--data "{\"text\":\"hello world\"}" \
--output hello_world.wav \
"{url}/v1/synthesize?voice=en-US_AllisonV3Voice"

IBM Cloud Pak for Data

curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: application/json" \
--header "Accept: audio/wav" \
--data "{\"text\":\"hello world\"}" \
--output hello_world.wav \
"{url}/v1/synthesize?voice=en-US_AllisonV3Voice"

The following example shows an equivalent HTTP GET request for speech synthesis:

IBM Cloud

curl -X GET -u "apikey:{apikey}" \
--output hello_world.wav \
"{url}/v1/synthesize?accept=audio%2Fwav&text=hello%20world&voice=en-US_AllisonV3Voice"

IBM Cloud Pak for Data

curl -X POST \
--header "Authorization: Bearer {token}" \
--output hello_world.wav \
"{url}/v1/synthesize?accept=audio%2Fwav&text=hello%20world&voice=en-US_AllisonV3Voice"

Using the default voice

If you omit the voice parameter from a request, the service uses the US English en-US_MichaelV3Voice by default. This default applies to all speech synthesis requests and to the GET /v1/pronunciation method.

IBM Cloud Pak for Data If you do not install the en-US_MichaelV3Voice, it cannot serve as the default voice. In this case, you must either

  • Use the voice parameter to pass the voice that is to be used with each request.
  • Specify a new default voice for your installation of Text to Speech for IBM Cloud Pak for Data by using the defaultTTSVoice property in the Speech services custom resource. For more information, see Installing Watson Text to Speech.

Multilingual speech synthesis

The service does not support multilingual speech synthesis at this time. All synthesis is based on the language of the voice that is specified by the voice parameter. Depending on the language and the word in question, you might be able to use customization to approximate the pronunciation of a word in a language that is different from the voice of the request. For more information, see Creating a custom model.

If you decide to use customization to emulate pronunciation in a different language, use the HTTP GET /v1/pronunciation method to see the pronunciation of the word in the other language. The method returns the phonemes that the service uses to pronounce the word in that language. For more information, see Phonetic translation.

You can adjust the phonemes that the method returns to match as closely as possible the phonemes that are available in your language. You can then create a custom model that includes a custom word with that translation and use that model with your synthesis request. Becuase two different languages might not support the same phonemes, it might not be possible to match exactly the sounds and pronunciation of one language with the phonetic symbols of another language.

The Speech Synthesis Markup Language (SSML) <speak> element includes an xml:lang element, but that element applies to the entire request, and the service does not support its use as a way of specifying a different language for speech synthesis.