IBM Cloud Docs
Getting started with Speech to Text

Getting started with Speech to Text

The IBM Watson® Speech to Text service transcribes audio to text to enable speech transcription capabilities for applications. This curl-based tutorial can help you get started quickly with the service. The examples show you how to call the service's POST /v1/recognize method to request a transcript.

The tutorial uses the curl command-line utility to demonstrate REST API calls. For more information about curl, see Using curl with Watson examples.

IBM Cloud Watch the following video for a visual summary of getting started with the Speech to Text service.

Before you begin

IBM Cloud

IBM Cloud

  • Create an instance of the service:

    1. Go to the Speech to Text page in the IBM Cloud catalog.
    2. Sign up for a free IBM Cloud account or log in.
    3. Read and agree to the terms of the license agreement.
    4. Click Create.
  • Copy the credentials to authenticate to your service instance:

    1. View the Manage page for the service instance:

      • If you are on the Getting started page for your service instance, click the Manage entry in the list of topics.
      • If you are on the Resource list page, expand the AI / Machine Learning grouping in the Name column, and click the name of your service instance.
    2. On the Manage page, click Show Credentials in the Credentials box.

    3. Copy the API Key and URL values for the service instance.

This tutorial uses an API key to authenticate. In production, use an IAM token. For more information see Authenticating to IBM Cloud.

IBM Cloud Pak for Data

IBM Cloud Pak for Data

The Speech to Text for IBM Cloud Pak for Data must be installed and configured before beginning this tutorial. For more information, see Watson Speech services on Cloud Pak for Data.

  1. Create an instance of the service by using the web client, the API, or the command-line interface. For more information about creating a service instance, see Creating a Watson Speech services instance.
  2. Follow the instructions in Creating a Watson Speech services instance to obtain a Bearer token for the instance. This tutorial uses a Bearer token to authenticate to the service.

Transcribe audio with no options

Call the POST /v1/recognize method to request a basic transcript of a FLAC audio file with no additional request parameters.

  1. Download the sample audio file audio-file.flac.

  2. Issue the following command to call the service's /v1/recognize method for basic transcription with no parameters. The example uses the Content-Type header to indicate the type of the audio, audio/flac. The example uses the default language model, en-US_BroadbandModel, for transcription.

    IBM Cloud

    • Replace {apikey} and {url} with your API key and URL.
    • Modify {path_to_file} to specify the location of the audio-file.flac file.
    curl -X POST -u "apikey:{apikey}" \
    --header "Content-Type: audio/flac" \
    --data-binary @{path_to_file}audio-file.flac \
    "{url}/v1/recognize"
    

    IBM Cloud Pak for Data

    • Replace {token} and {url} with the access token and URL for your service instance.
    • Modify {path_to_file} to specify the location of the audio-file.flac file.
    curl -X POST \
    --header "Authorization: Bearer {token}" \
    --header "Content-Type: audio/flac" \
    --data-binary @{path_to_file}audio-file.flac \
    "{url}/v1/recognize"
    

The service returns the following transcription results:

{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.96
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ]
}

Transcribe audio with options

Call the POST /v1/recognize method to transcribe the same FLAC audio file, but specify two transcription parameters.

  1. If necessary, download the sample audio file audio-file.flac.

  2. Issue the following command to call the service's /v1/recognize method with two extra parameters. Set the timestamps parameter to true to indicate the beginning and end of each word in the audio stream. Set the max_alternatives parameter to 3 to receive the three most likely alternatives for the transcription. The example uses the Content-Type header to indicate the type of the audio, audio/flac, and the request uses the default model, en-US_BroadbandModel.

    IBM Cloud

    • Replace {apikey} and {url} with your API key and URL.
    • Modify {path_to_file} to specify the location of the audio-file.flac file.
    curl -X POST -u "apikey:{apikey}" \
    --header "Content-Type: audio/flac" \
    --data-binary @{path_to_file}audio-file.flac \
    "{url}/v1/recognize?timestamps=true&max_alternatives=3"
    

    IBM Cloud Pak for Data

    • Replace {token} and {url} with the access token and URL for your service instance.
    • Modify {path_to_file} to specify the location of the audio-file.flac file.
    curl -X POST \
    --header "Authorization: Bearer {token}" \
    --header "Content-Type: audio/flac" \
    --data-binary @{path_to_file}audio-file.flac \
    "{url}/v1/recognize?timestamps=true&max_alternatives=3"
    

The service returns the following results, which include timestamps and three alternative transcriptions:

{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "timestamps": [
            ["several":, 1.0, 1.51],
            ["tornadoes":, 1.51, 2.15],
            ["touch":, 2.15, 2.5],
            . . .
          ]
        },
        {
          "confidence": 0.96
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado and Sunday "
        }
      ],
      "final": true
    }
  ]
}

Next steps

  • To try an example application that transcribes text from streaming audio input or from a file that you upload, see the Speech to Text demo.
  • For more information about the service's interfaces and features, see Service features.
  • For more information about all methods of the service's interfaces, see the API & SDK reference.