IBM Cloud Docs
Getting started with Text to Speech

Getting started with Text to Speech

The IBM Watson® Text to Speech service converts written text to natural-sounding speech to provide speech-synthesis capabilities for applications. This curl-based tutorial can help you get started quickly with the service. The examples show you how to call the service's POST and GET /v1/synthesize methods to request an audio stream.

The tutorial uses the curl command-line utility to demonstrate REST API calls. For more information about curl, see Using curl with Watson examples.

IBM Cloud Watch the following video for a visual summary of getting started with the Text to Speech service.

Before you begin

IBM Cloud

IBM Cloud

  • Create an instance of the service:

    1. Go to the Text to Speech page in the IBM Cloud catalog.
    2. Sign up for a free IBM Cloud account or log in.
    3. Read and agree to the terms of the license agreement.
    4. Click Create.
  • Copy the credentials to authenticate to your service instance:

    1. View the Manage page for the service instance:

      • If you are on the Getting started page for your service instance, click the Manage entry in the list of topics.
      • If you are on the Resource list page, expand the AI / Machine Learning grouping in the Name column, and click the name of your service instance.
    2. On the Manage page, click Show Credentials in the Credentials box.

    3. Copy the API Key and URL values for the service instance.

This tutorial uses an API key to authenticate. In production, use an IAM token. For more information see Authenticating to IBM Cloud.

IBM Cloud Pak for Data

IBM Cloud Pak for Data

The Text to Speech for IBM Cloud Pak for Data must be installed and configured before beginning this tutorial. For more information, see Watson Speech services on Cloud Pak for Data.

  1. Create an instance of the service by using the web client, the API, or the command-line interface. For more information about creating a service instance, see Creating a Watson Speech services instance.
  2. Follow the instructions in Creating a Watson Speech services instance to obtain a Bearer token for the instance. This tutorial uses a Bearer token to authenticate to the service.

Synthesize text in US English

The following command use the POST /v1/synthesize method to synthesize US English input to audio. The request uses the voice en-US_MichaelV3Voice. It produces audio in the WAV format.

You can use a browser or other tools to play the audio files that are produced by the examples in this tutorial. For more information, see Playing an audio file.

  1. Issue the following command to synthesize the string "hello world". The request produces a WAV file that is named hello_world.wav.

    IBM Cloud

    • Replace {apikey} and {url} with your API key and URL.
    curl -X POST -u "apikey:{apikey}" \
    --header "Content-Type: application/json" \
    --header "Accept: audio/wav" \
    --data "{\"text\":\"hello world\"}" \
    --output hello_world.wav \
    "{url}/v1/synthesize?voice=en-US_MichaelV3Voice"
    

    IBM Cloud Pak for Data

    • Replace {token} and {url} with the access token and URL for your service instance.
    curl -X POST \
    --header "Authorization: Bearer {token}" \
    --header "Content-Type: application/json" \
    --header "Accept: audio/wav" \
    --data "{\"text\":\"hello world\"}" \
    --output hello_world.wav \
    "{url}/v1/synthesize?voice=en-US_MichaelV3Voice"
    

Use a different voice and audio format

The following command again uses the POST /v1/synthesize method to synthesize the same US English input to audio. But this request uses the voice en-US_AllisonV3Voice and explicitly requests audio in the default Ogg format.

  1. Issue the following command to synthesize the string "hello world" but with a different voice. The request produces an Ogg file that is named hello_world.ogg.

    IBM Cloud

    • Replace {apikey} and {url} with your API key and URL.
    curl -X POST -u "apikey:{apikey}" \
    --header "Content-Type: application/json" \
    --data "{\"text\":\"hello world\"}" \
    --output hello_world.ogg \
    "{url}/v1/synthesize?voice=en-US_AllisonV3Voice"
    

    IBM Cloud Pak for Data

    • Replace {token} and {url} with the access token and URL for your service instance.
    curl -X POST \
    --header "Authorization: Bearer {token}" \
    --header "Content-Type: application/json" \
    --header "Accept: audio/wav" \
    --data "{\"text\":\"hello world\"}" \
    --output hello_world.wav \
    "{url}/v1/synthesize?voice=en-US_AllisonV3Voice"
    

Synthesize text in Spanish

The following command uses the GET /v1/synthesize method to synthesize Spanish input to an audio file. The GET method includes three query parameters: accept to specify the audio format, text to specify the input text for the audio, and voice to specify a Spanish voice. Because accept and text are passed as query parameters, the request is URL-encoded.

  1. Issue the following command to synthesize the string "hola mundo" and produce a WAV file that is named hola_mundo.wav.

    IBM Cloud

    • Replace {apikey} and {url} with your API key and URL.
    curl -X GET -u "apikey:{apikey}" \
    --output hola_mundo.wav \
    "{url}/v1/synthesize?accept=audio%2Fwav&text=hola%20mundo&voice=es-ES_EnriqueV3Voice"
    

    IBM Cloud Pak for Data

    • Replace {token} and {url} with the access token and URL for your service instance.
    curl -X POST \
    --header "Authorization: Bearer {token}" \
    --output hola_mundo.wav \
    "{url}/v1/synthesize?accept=audio%2Fwav&text=hola%20mundo&voice=es-ES_EnriqueV3Voice"
    

Next steps

  • To try an example application that accepts text and generates speech with different voices, see the Text to Speech demo.
  • For more information about the service's interfaces and features, see Service features.
  • For more information about all methods of the service's interfaces, see the API & SDK reference.