Speech to Text | IBM Cloud API Docs

Introduction

Last updated: 2024-06-19

The IBM Watson™ Speech to Text service provides APIs that use IBM's speech-recognition capabilities to produce transcripts of spoken audio. The service can transcribe speech from various languages and audio formats. In addition to basic transcription, the service can produce detailed information about many different aspects of the audio. It returns all JSON response content in the UTF-8 character set.

The service supports three types of models: large speech models that use the locale (ex.: en-US, fr-FR) as their name, previous-generation models that include the terms Broadband and Narrowband in their names, and next-generation models that include the terms Multimedia and Telephony in their names. Broadband and multimedia models have minimum sampling rates of 16 kHz. Narrowband and telephony models have minimum sampling rates of 8 kHz. The large speech models and next-generation models offer high throughput and greater transcription accuracy.

Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see Migrating to large speech models.

For speech recognition, the service supports synchronous and asynchronous HTTP Representational State Transfer (REST) interfaces. It also supports a WebSocket interface that provides a full-duplex, low-latency communication channel: Clients send requests and audio to the service and receive results over a single connection asynchronously.

The service also offers two customization interfaces. Use language model customization to expand the vocabulary of a base model with domain-specific terminology. Use acoustic model customization to adapt a base model for the acoustic characteristics of your audio. For language model customization, the service also supports grammars. A grammar is a formal language specification that lets you restrict the phrases that the service can recognize.

Language model customization is available for most large speech models, previous- and next-generation models. Acoustic model customization is available for all previous-generation models.

This documentation describes Java SDK major version 9. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Node SDK major version 6. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Python SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Ruby SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes .NET Standard SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Go SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Swift SDK major version 4. For more information about how to update your code from the previous version, see the migration guide.

This documentation describes Unity SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.

The IBM Watson Unity SDK has the following requirements.

The SDK requires Unity version 2018.2 or later to support Transport Layer Security (TLS) 1.2.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to .NET 4.x Equivalent.
- For more information, see TLS 1.0 support.
The SDK doesn't support the WebGL projects. Change your build settings to any platform except WebGL.

For more information about how to install and configure the SDK and SDK Core, see https://github.com/watson-developer-cloud/unity-sdk.

The code examples on this tab use the client library that is provided for Java.

Maven

<dependency>
  <groupId>com.ibm.watson</groupId>
  <artifactId>ibm-watson</artifactId>
  <version>11.0.0</version>
</dependency>
Copy to clipboard

Gradle

compile 'com.ibm.watson:ibm-watson:11.0.0'

GitHub

https://github.com/watson-developer-cloud/java-sdk

The code examples on this tab use the client library that is provided for Node.js.

Installation

npm install ibm-watson@^8.0.0

GitHub

https://github.com/watson-developer-cloud/node-sdk

The code examples on this tab use the client library that is provided for Python.

Installation

pip install --upgrade "ibm-watson>=7.0.0"

GitHub

https://github.com/watson-developer-cloud/python-sdk

The code examples on this tab use the client library that is provided for Ruby.

Installation

gem install ibm_watson

GitHub

https://github.com/watson-developer-cloud/ruby-sdk

The code examples on this tab use the client library that is provided for Go.

go get -u github.com/watson-developer-cloud/go-sdk/v2@v3.0.0

GitHub

https://github.com/watson-developer-cloud/go-sdk

The code examples on this tab use the client library that is provided for Swift.

Cocoapods

pod 'IBMWatsonSpeechToTextV1', '~> 5.0.0'

Carthage

github "watson-developer-cloud/swift-sdk" ~> 5.0.0

Swift Package Manager

.package(url: "https://github.com/watson-developer-cloud/swift-sdk", from: "5.0.0")

GitHub

https://github.com/watson-developer-cloud/swift-sdk

The code examples on this tab use the client library that is provided for .NET Standard.

Package Manager

Install-Package IBM.Watson.SpeechToText.v1 -Version 7.0.0

.NET CLI

dotnet add package IBM.Watson.SpeechToText.v1 --version 7.0.0

PackageReference

<PackageReference Include="IBM.Watson.SpeechToText.v1" Version="7.0.0" />
Copy to clipboard

GitHub

href="https://github.com/watson-developer-cloud/dotnet-standard-sdk

The code examples on this tab use the client library that is provided for Unity.

GitHub

https://github.com/watson-developer-cloud/unity-sdk

https://github.com/IBM/unity-sdk-core

Endpoint URLs

Identify the base URL for your service instance.

IBM Cloud URLs

The base URLs come from the service instance. To find the URL, view the service credentials by clicking the name of the service in the Resource list. Use the value of the URL. Add the method to form the complete API endpoint for your request.

The following example URL represents a Speech to Text instance that is hosted in Washington, DC:

https://api.us-east.speech-to-text.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2

The following URLs represent the base URLs for Speech to Text. When you call the API, use the URL that corresponds to the location of your service instance.

Dallas: https://api.us-south.speech-to-text.watson.cloud.ibm.com
Washington, DC: https://api.us-east.speech-to-text.watson.cloud.ibm.com
Frankfurt: https://api.eu-de.speech-to-text.watson.cloud.ibm.com
Sydney: https://api.au-syd.speech-to-text.watson.cloud.ibm.com
Tokyo: https://api.jp-tok.speech-to-text.watson.cloud.ibm.com
London: https://api.eu-gb.speech-to-text.watson.cloud.ibm.com
Seoul: https://api.kr-seo.speech-to-text.watson.cloud.ibm.com

Set the correct service URL by calling the setServiceUrl() method of the service instance.

Set the correct service URL by specifying the serviceUrl parameter when you create the service instance.

Set the correct service URL by calling the set_service_url() method of the service instance.

Set the correct service URL by specifying the service_url property of the service instance.

Set the correct service URL by calling the SetServiceURL() method of the service instance.

Set the correct service URL by setting the serviceURL property of the service instance.

Set the correct service URL by calling the SetServiceUrl() method of the service instance.

Dallas API endpoint example for services managed on IBM Cloud

curl -X {request_method} -u "apikey:{apikey}" "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}"

Your service instance might not use this URL

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: 'https://api.us-east.speech-to-text.watson.cloud.ibm.com',
});
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('https://api.us-east.speech-to-text.watson.cloud.ibm.com')
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::IamAuthenticator.new(
  apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "https://api.us-east.speech-to-text.watson.cloud.ibm.com"
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

if speechToTextErr != nil {
  panic(speechToTextErr)
}

speechToText.SetServiceURL("https://api.us-east.speech-to-text.watson.cloud.ibm.com")
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "https://api.us-east.speech-to-text.watson.cloud.ibm.com"
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Copy to clipboard

Default URL

https://api.us-south.speech-to-text.watson.cloud.ibm.com

Example for the Washington, DC location

var authenticator = new IamAuthenticator(
    apikey: "{apikey}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Copy to clipboard

Cloud Pak for Data URLs

For services installed on Cloud Pak for Data, the base URLs come from both the cluster and service instance.

You can find the base URL from the Cloud Pak for Data web client in the details page about the instance. Click the name of the service in your list of instances to see the URL.

Use that URL in your requests to Speech to Text. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the URL by calling the setServiceUrl() method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by specifying the serviceUrl parameter when you create the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by specifying the url parameter when you create the service instance or by calling the set_url() method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by specifying the url parameter when you create the service instance or by calling the url= method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by specifying the URL parameter when you create the service instance or by calling the SetURL= method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by setting the serviceURL property of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by calling the SetEndpoint() method of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Set the correct service URL by setting the Url property of the service instance. For Cloud Pak for Data System, use a hostname that resolves to an IP address in the cluster.

Endpoint example for Cloud Pak for Data

curl -X {request_method} -H "Authorization: Bearer {token}" "https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api"

Endpoint example for Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api");
Copy to clipboard

Endpoint example for Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}',
  }),
  serviceUrl: 'https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api',
});
Copy to clipboard

Endpoint example for Cloud Pak for Data

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api')
Copy to clipboard

Endpoint example for Cloud Pak for Data

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::CLoudPakForDataAuthenticator.new(
  username: "{username}",
  password: "{password}",
  url: "https://{cpd_cluster_host}{:port}"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api"
Copy to clipboard

Endpoint example for Cloud Pak for Data

speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

if speechToTextErr != nil {
  panic(speechToTextErr)
}

speechToText.SetServiceURL("https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api")
Copy to clipboard

Endpoint example for Cloud Pak for Data

let authenticator = CloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api"
Copy to clipboard

Endpoint example for Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api");
Copy to clipboard

Endpoint example for Cloud Pak for Data

var authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://{cpd_cluster_host}{:port}/speech-to-text/{deployment_id}/instances/{instance_id}/api");
Copy to clipboard

Disabling SSL verification

All Watson services use Secure Sockets Layer (SSL) (or Transport Layer Security (TLS)) for secure connections between the client and server. The connection is verified against the local certificate store to ensure authentication, integrity, and confidentiality.

If you use a self-signed certificate, you need to disable SSL verification to make a successful connection.

Enabling SSL verification is highly recommended. Disabling SSL jeopardizes the security of the connection and data. Disable SSL only if necessary, and take steps to enable SSL as soon as possible.

To disable SSL verification for a curl request, use the --insecure (-k) option with the request.

To disable SSL verification, create an HttpConfigOptions object and set the disableSslVerification property to true. Then, pass the object to the service instance by using the configureClient method.

To disable SSL verification, set the disableSslVerification parameter to true when you create the service instance.

To disable SSL verification, specify True on the set_disable_ssl_verification method for the service instance.

To disable SSL verification, set the disable_ssl_verification parameter to true in the configure_http_client() method for the service instance.

To disable SSL verification, call the DisableSSLVerification method on the service instance.

To disable SSL verification, call the disableSSLVerification() method on the service instance. You cannot disable SSL verification on Linux.

To disable SSL verification, set the DisableSslVerification method to true on the service instance.

Example to disable SSL verification with a service managed on IBM Cloud. Replace {apikey} and {url} with your service credentials.

curl -k -X {request_method} -u "apikey:{apikey}" "{url}/{method}"

Example to disable SSL verification with a service managed on IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
  .disableSslVerification(true)
  .build();
speechToText.configureClient(configOptions);
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
  disableSslVerification: true,
});
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_to_text.set_disable_ssl_verification(True)
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::IamAuthenticator.new(
  apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "{url}"

speech_to_text.configure_http_client(disable_ssl_verification: true)
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

if speechToTextErr != nil {
  panic(speechToTextErr)
}

speechToText.SetServiceURL("{url}")

speechToText.DisableSSLVerification()
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"

speechToText.disableSSLVerification()
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.DisableSslVerification(true);
Copy to clipboard

Example to disable SSL verification with a service managed on IBM Cloud

var authenticator = new IamAuthenticator(
    apikey: "{apikey}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.DisableSslVerification = true;
Copy to clipboard

Example to disable SSL verification with an installed service

curl -k -X {request_method} -H "Authorization: Bearer {token}" "{url}/v1/{method}"

Example to disable SSL verification with an installed service

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}";

HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
  .disableSslVerification(true)
  .build();
speechToText.configureClient(configOptions);
Copy to clipboard

Example to disable SSL verification with an installed service

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}',
  }),
  serviceUrl: '{url}',
  disableSslVerification: true,
});
Copy to clipboard

Example to disable SSL verification with an installed service

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_to_text.set_disable_ssl_verification(True)
Copy to clipboard

Example to disable SSL verification with an installed service

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::CLoudPakForDataAuthenticator.new(
  username: "{username}",
  password: "{password}",
  url: "https://{cpd_cluster_host}{:port}"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "{url}"

speech_to_text.configure_http_client(disable_ssl_verification: true)
Copy to clipboard

Example to disable SSL verification with an installed service

speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

if speechToTextErr != nil {
  panic(speechToTextErr)
}

speechToText.SetServiceURL("{url}")

speechToText.DisableSSLVerification()
Copy to clipboard

Example to disable SSL verification with an installed service

let authenticator = WatsonCloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"

speechToText.disableSSLVerification()
Copy to clipboard

Example to disable SSL verification with an installed service

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.DisableSslVerification(true);
Copy to clipboard

Example to disable SSL verification with an installed service

var authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.DisableSslVerification = true;
Copy to clipboard

Authentication

IBM Cloud services use IBM Cloud Identity and Access Management (IAM) to authenticate. With IBM Cloud Pak for Data, you pass a bearer token.

For IBM Cloud instances, you authenticate to the API by using IBM Cloud Identity and Access Management (IAM).

You can pass either a bearer token in an authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication. For more information, see Authenticating to Watson services.

For testing and development, you can pass an API key directly.
For production use, unless you use the Watson SDKs, use an IAM token.

If you pass in an API key, use apikey for the username and the value of the API key as the password. For example, if the API key is f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI in the service credentials, include the credentials in your call like this:

curl -u "apikey:f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI"

For IBM Cloud instances, the SDK provides initialization methods for each form of authentication.

Use the API key to have the SDK manage the lifecycle of the access token. The SDK requests an access token, ensures that the access token is valid, and refreshes it if necessary.
Use the access token to manage the lifecycle yourself. You must periodically refresh the token.

For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.

IBM Cloud. Replace {apikey} and {url} with your service credentials.

curl -X {request_method} -u "apikey:{apikey}" "{url}/v1/{method}"

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::IamAuthenticator.new(
  apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "{url}"
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

import (
  "github.com/IBM/go-sdk-core/core"
  "github.com/watson-developer-cloud/go-sdk/speechtotextv1"
)

func main() {
  authenticator := &core.IamAuthenticator{
    ApiKey: "{apikey}",
  }

  options := &speechtotextv1.SpeechToTextV1Options{
    Authenticator: authenticator,
  }

  speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

  if speechToTextErr != nil {
    panic(speechToTextErr)
  }

  speechToText.SetServiceURL("{url}")
}
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
Copy to clipboard

IBM Cloud. SDK managing the IAM token. Replace {apikey} and {url}.

var authenticator = new IamAuthenticator(
    apikey: "{apikey}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
Copy to clipboard

Cloud Pak for Data

For Cloud Pak for Data, you pass a bearer token in an Authorization header to authenticate to the API. The token is associated with a username.

For testing and development, you can use the bearer token that's displayed in the Cloud Pak for Data web client. To find this token, view the details for the service instance by clicking the name of the service in your list of instances. The details also include the service endpoint URL. Don't use this token in production because it does not expire.
For production use, create a user in the Cloud Pak for Data web client to use for authentication. Generate a token from that user's credentials with the POST /v1/authorize method.

For more information, see the Get authorization token method of the Cloud Pak for Data API reference.

For Cloud Pak for Data instances, pass either username and password credentials or a bearer token that you generate to authenticate to the API. Username and password credentials use basic authentication. However, the SDK manages the lifecycle of the token. Tokens are temporary security credentials. If you pass a token, you maintain the token lifecycle.

For production use, create a user in the Cloud Pak for Data web client to use for authentication, and decide which authentication mechanism to use.

To have the SDK manage the lifecycle of the token, use the username and password for that new user in your calls.
To manage the lifecycle of the token yourself, generate a token from that user's credentials. Call the POST /v1/authorize method to generate the token, and then pass the token in an Authorization header in your calls. You can see an example of the method on the Curl tab.

For more information, see the Get authorization token method of the Cloud Pak for Data API reference.

Don't use the bearer token that's displayed in the web client for the instance except during testing and development because that token does not expire.

To find your value for {url}, view the details for the service instance by clicking the name of the service in your list of instances in the Cloud Pak for Data web client.

Cloud Pak for Data. Generating a bearer token.

Replace {cpd_cluster_host} and {port} with the details for the service instance. Replace {username} and {password} with your Cloud Pak for Data credentials.

curl -k -X POST -H "cache-control: no-cache" -H "Content-Type: application/json" -d "{\"username\":\"{username}\",\"password\":\"{password}\"}" "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize"
Copy to clipboard

The response includes a token property.

Authenticating to the API. Replace {token} with your details.

curl -H "Authorization: Bearer {token}" "{url}/v1/{method}"

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson

authenticator = Authenticators::CloudPakForDataAuthenticator.new(
  username: "{username}",
  password: "{password}",
  url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize"
)
speech_to_text = SpeechToTextV1.new(
  authenticator: authenticator
)
speech_to_text.service_url = "{url}"
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

import (
  "github.com/IBM/go-sdk-core/core"
  "github.com/watson-developer-cloud/go-sdk/speechtotextv1"
)

func main() {
  authenticator := &core.CloudPakForDataAuthenticator{
    URL: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    Username: "{username}",
    Password: "{password}",
  }

  options := &speechtotextv1.SpeechToTextV1Options{
    Authenticator: authenticator,
  }

  speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

  if speechToTextErr != nil {
    panic(speechToTextErr)
  }

  speechToText.SetServiceURL("{url}")
}
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {url}, see Endpoint URLs.

let authenticator = WatsonCloudPakForDataAuthenticator(username: "{username}", password: "{password}", url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {cpd_cluster_host}, {port}, {release}, and {instance_id}, see Endpoint URLs.

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
Copy to clipboard

Cloud Pak for Data. SDK managing the token.

Replace {username} and {password} with your Cloud Pak for Data credentials. For {cpd_cluster_host}, {port}, {release}, and {instance_id}, see Endpoint URLs.

var authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
Copy to clipboard

Access between services

Your application might use more than one Watson service. You can grant access between services and you can grant access to more than one service for your applications.

For IBM Cloud services, the method to grant access between Watson services varies depending on the type of API key. For more information, see IAM access.

To grant access between IBM Cloud services, create an authorization between the services. For more information, see Granting access between services.
To grant access to your services by applications without using user credentials, create a service ID, add an API key, and assign access policies. For more information, see Creating and working with service IDs.

When you give a user ID access to multiple services, use an endpoint URL that includes the service instance ID (for example, https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2). You can find the instance ID in two places:

By clicking the service instance row in the Resource list. The instance ID is the GUID in the details pane.
By clicking the name of the service instance in the list and looking at the credentials URL.

If you don't see the instance ID in the URL, the credentials predate service IDs. Add new credentials from the Service credentials page and use those credentials.

Because the Cloud Pak for Data bearer token is associated with a username, you can use the token for all CPD Watson services that are associated with the username.

Error handling

Speech to Text uses standard HTTP response codes to indicate whether a method completed successfully. HTTP response codes in the 2xx range indicate success. A response in the 4xx range is some sort of failure, and a response in the 5xx range usually indicates an internal system error that cannot be resolved by the user. Response codes are listed with the method.

ErrorResponse

Name	Description
error string	Description of the problem.
code integer	HTTP response code.
code_description string	Response message.
warnings string	Warnings associated with the error.

The Java SDK generates an exception for any unsuccessful method invocation. All methods that accept an argument can also throw an IllegalArgumentException.

Exception	Description
IllegalArgumentException	An invalid argument was passed to the method.

When the Java SDK receives an error response from the Speech to Text service, it generates an exception from the com.ibm.watson.developer_cloud.service.exception package. All service exceptions contain the following fields.

Field	Description
statusCode	The HTTP response code that is returned.
message	A message that describes the error.

When the Node SDK receives an error response from the Speech to Text service, it creates an Error object with information that describes the error that occurred. This error object is passed as the first parameter to the callback function for the method. The contents of the error object are as shown in the following table.

Error

Field	Description
code	The HTTP response code that is returned.
message	A message that describes the error.

The Python SDK generates an exception for any unsuccessful method invocation. When the Python SDK receives an error response from the Speech to Text service, it generates an ApiException with the following fields.

Field	Description
code	The HTTP response code that is returned.
message	A message that describes the error.
info	A dictionary of additional information about the error.

When the Ruby SDK receives an error response from the Speech to Text service, it generates an ApiException with the following fields.

Field	Description
code	The HTTP response code that is returned.
message	A message that describes the error.
info	A dictionary of additional information about the error.

The Go SDK generates an error for any unsuccessful service instantiation and method invocation. You can check for the error immediately. The contents of the error object are as shown in the following table.

Error

Field	Description
code	The HTTP response code that is returned.
message	A message that describes the error.

The Swift SDK returns a WatsonError in the completionHandler any unsuccessful method invocation. This error type is an enum that conforms to LocalizedError and contains an errorDescription property that returns an error message. Some of the WatsonError cases contain associated values that reveal more information about the error.

Field	Description
errorDescription	A message that describes the error.

When the .NET Standard SDK receives an error response from the Speech to Text service, it generates a ServiceResponseException with the following fields.

Field	Description
Message	A message that describes the error.
CodeDescription	The HTTP response code that is returned.

When the Unity SDK receives an error response from the Speech to Text service, it generates an IBMError with the following fields.

Field	Description
Url	The URL that generated the error.
StatusCode	The HTTP response code returned.
ErrorMessage	A message that describes the error.
Response	The contents of the response from the server.
ResponseHeaders	A dictionary of headers returned by the request.

Example error handling

try {
  // Invoke a method
} catch (NotFoundException e) {
  // Handle Not Found (404) exception
} catch (RequestTooLargeException e) {
  // Handle Request Too Large (413) exception
} catch (ServiceResponseException e) {
  // Base class for all exceptions caused by error responses from the service
  System.out.println("Service returned status code "
    + e.getStatusCode() + ": " + e.getMessage());
}
Copy to clipboard

Example error handling

speechToText.method(params)
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example error handling

from ibm_watson import ApiException
try:
    # Invoke a method
except ApiException as ex:
    print "Method failed with status code " + str(ex.code) + ": " + ex.message
Copy to clipboard

Example error handling

require "ibm_watson"
begin
  # Invoke a method
rescue IBMWatson::ApiException => ex
  print "Method failed with status code #{ex.code}: #{ex.error}"
end
Copy to clipboard

Example error handling

import "github.com/watson-developer-cloud/go-sdk/speechtotextv1"

// Instantiate a service
speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)

// Check for errors
if speechToTextErr != nil {
  panic(speechToTextErr)
}

// Call a method
result, _, responseErr := speechToText.MethodName(&methodOptions)

// Check for errors
if responseErr != nil {
  panic(responseErr)
}
Copy to clipboard

Example error handling

speechToText.method() {
  response, error in

  if let error = error {
    switch error {
    case let .http(statusCode, message, metadata):
      switch statusCode {
      case .some(404):
        // Handle Not Found (404) exception
        print("Not found")
      case .some(413):
        // Handle Request Too Large (413) exception
        print("Payload too large")
      default:
        if let statusCode = statusCode {
          print("Error - code: \(statusCode), \(message ?? "")")
        }
      }
    default:
      print(error.localizedDescription)
    }
    return
  }

  guard let result = response?.result else {
    print(error?.localizedDescription ?? "unknown error")
    return
  }

  print(result)
}
Copy to clipboard

Example error handling

try
{
    // Invoke a method
}
catch(ServiceResponseException e)
{
    Console.WriteLine("Error: " + e.Message);
}
catch (Exception e)
{
    Console.WriteLine("Error: " + e.Message);
}
Copy to clipboard

Example error handling

// Invoke a method
speechToText.MethodName(Callback, Parameters);

// Check for errors
private void Callback(DetailedResponse<ExampleResponse> response, IBMError error)
{
    if (error == null)
    {
        Log.Debug("ExampleCallback", "Response received: {0}", response.Response);
    }
    else
    {
        Log.Debug("ExampleCallback", "Error received: {0}, {1}, {3}", error.StatusCode, error.ErrorMessage, error.Response);
    }
}
Copy to clipboard

Data handling

Additional headers

Some Watson services accept special parameters in headers that are passed with the request.

You can pass request header parameters in all requests or in a single request to the service.

To pass a request header, use the --header (-H) option with a curl request.

To pass header parameters with every request, use the setDefaultHeaders method of the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, use the addHeader method as a modifier on the request before you execute it.

To pass header parameters with every request, specify the headers parameter when you create the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, use the headers method as a modifier on the request before you execute it.

To pass header parameters with every request, specify the set_default_headers method of the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, include headers as a dict in the request.

To pass header parameters with every request, specify the add_default_headers method of the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, specify the headers method as a chainable method in the request.

To pass header parameters with every request, specify the SetDefaultHeaders method of the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, specify the Headers as a map in the request.

To pass header parameters with every request, add them to the defaultHeaders property of the service object. See Data collection for an example use of this method.

To pass header parameters in a single request, pass the headers parameter to the request method.

To pass header parameters in a single request, use the WithHeader() method as a modifier on the request before you execute it. See Data collection for an example use of this method.

To pass header parameters in a single request, use the WithHeader() method as a modifier on the request before you execute it.

Example header parameter in a request

curl -X {request_method} -H "Request-Header: {header_value}" "{url}/v1/{method}"

Example header parameter in a request

ReturnType returnValue = speechToText.methodName(parameters)
  .addHeader("Custom-Header", "{header_value}")
  .execute();
Copy to clipboard

Example header parameter in a request

const parameters = {
  {parameters}
};

speechToText.methodName(
  parameters,
  headers: {
    'Custom-Header': '{header_value}'
  })
   .then(result => {
    console.log(response);
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example header parameter in a request

response = speech_to_text.methodName(
    parameters,
    headers = {
        'Custom-Header': '{header_value}'
    })

Example header parameter in a request

response = speech_to_text.headers(
  "Custom-Header" => "{header_value}"
).methodName(parameters)

Example header parameter in a request

result, _, responseErr := speechToText.MethodName(
  &methodOptions{
    Headers: map[string]string{
      "Accept": "application/json",
    },
  },
)
Copy to clipboard

Example header parameter in a request

let customHeader: [String: String] = ["Custom-Header": "{header_value}"]
speechToText.methodName(parameters, headers: customHeader) {
  response, error in
}
Copy to clipboard

Example header parameter in a request for a service managed on IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.WithHeader("Custom-Header", "header_value");
Copy to clipboard

Example header parameter in a request for an installed service

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://{cpd_cluster_host}{:port}/speech-to-text/{release}/instances/{instance_id}/api");

speechToText.WithHeader("Custom-Header", "header_value");
Copy to clipboard

Example header parameter in a request for a service managed on IBM Cloud

var authenticator = new IamAuthenticator(
    apikey: "{apikey}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.WithHeader("Custom-Header", "header_value");
Copy to clipboard

Example header parameter in a request for an installed service

var authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}",
    username: "{username}",
    password: "{password}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://{cpd_cluster_host}{:port}/speech-to-text/{release}/instances/{instance_id}/api");

speechToText.WithHeader("Custom-Header", "header_value");
Copy to clipboard

Response details

The Speech to Text service might return information to the application in response headers.

To access all response headers that the service returns, include the --include (-i) option with a curl request. To see detailed response data for the request, including request headers, response headers, and extra debugging information, include the --verbose (-v) option with the request.

Example request to access response headers

curl -X {request_method} {authentication_method} --include "{url}/v1/{method}"

To access information in the response headers, use one of the request methods that returns details with the response: executeWithDetails(), enqueueWithDetails(), or rxWithDetails(). These methods return a Response<T> object, where T is the expected response model. Use the getResult() method to access the response object for the method, and use the getHeaders() method to access information in response headers.

Example request to access response headers

Response<ReturnType> response = speechToText.methodName(parameters)
  .executeWithDetails();
// Access response from methodName
ReturnType returnValue = response.getResult();
// Access information in response headers
Headers responseHeaders = response.getHeaders();
Copy to clipboard

All response data is available in the Response<T> object that is returned by each method. To access information in the response object, use the following properties.

Property	Description
`result`	Returns the response for the service-specific method.
`headers`	Returns the response header information.
`status`	Returns the HTTP status code.

Example request to access response headers

speechToText.methodName(parameters)
  .then(response => {
    console.log(response.headers);
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

The return value from all service methods is a DetailedResponse object. To access information in the result object or response headers, use the following methods.

DetailedResponse

Method	Description
`get_result()`	Returns the response for the service-specific method.
`get_headers()`	Returns the response header information.
`get_status_code()`	Returns the HTTP status code.

Example request to access response headers

speech_to_text.set_detailed_response(True)
response = speech_to_text.methodName(parameters)
# Access response from methodName
print(json.dumps(response.get_result(), indent=2))
# Access information in response headers
print(response.get_headers())
# Access HTTP response status
print(response.get_status_code())
Copy to clipboard

The return value from all service methods is a DetailedResponse object. To access information in the response object, use the following properties.

DetailedResponse

Property	Description
`result`	Returns the response for the service-specific method.
`headers`	Returns the response header information.
`status`	Returns the HTTP status code.

Example request to access response headers

response = speech_to_text.methodName(parameters)
# Access response from methodName
print response.result
# Access information in response headers
print response.headers
# Access HTTP response status
print response.status
Copy to clipboard

The return value from all service methods is a DetailedResponse object. To access information in the response object or response headers, use the following methods.

DetailedResponse

Method	Description
`GetResult()`	Returns the response for the service-specific method.
`GetHeaders()`	Returns the response header information.
`GetStatusCode()`	Returns the HTTP status code.

Example request to access response headers

import (
  "github.com/IBM/go-sdk-core/core"
  "github.com/watson-developer-cloud/go-sdk/speechtotextv1"
)
result, response, responseErr := speechToText.MethodName(
  &methodOptions{})
// Access result
core.PrettyPrint(response.GetResult(), "Result ")

// Access response headers
core.PrettyPrint(response.GetHeaders(), "Headers ")

// Access status code
core.PrettyPrint(response.GetStatusCode(), "Status Code ")
Copy to clipboard

All response data is available in the WatsonResponse<T> object that is returned in each method's completionHandler.

Example request to access response headers

speechToText.methodName(parameters) {
  response, error in

  guard let result = response?.result else {
    print(error?.localizedDescription ?? "unknown error")
    return
  }
  print(result) // The data returned by the service
  print(response?.statusCode)
  print(response?.headers)
}
Copy to clipboard

The response contains fields for response headers, response JSON, and the status code.

DetailedResponse

Property	Description
`Result`	Returns the result for the service-specific method.
`Response`	Returns the raw JSON response for the service-specific method.
`Headers`	Returns the response header information.
`StatusCode`	Returns the HTTP status code.

Example request to access response headers

var results = speechToText.MethodName(parameters);

var result = results.Result;            //  The result object
var responseHeaders = results.Headers;  //  The response headers
var responseJson = results.Response;    //  The raw response JSON
var statusCode = results.StatusCode;    //  The response status code
Copy to clipboard

The response contains fields for response headers, response JSON, and the status code.

DetailedResponse

Property	Description
`Result`	Returns the result for the service-specific method.
`Response`	Returns the raw JSON response for the service-specific method.
`Headers`	Returns the response header information.
`StatusCode`	Returns the HTTP status code.

Example request to access response headers

private void Example()
{
    speechToText.MethodName(Callback, Parameters);
}

private void Callback(DetailedResponse<ResponseType> response, IBMError error)
{
    var result = response.Result;                 //  The result object
    var responseHeaders = response.Headers;       //  The response headers
    var responseJson = reresponsesults.Response;  //  The raw response JSON
    var statusCode = response.StatusCode;         //  The response status code
}
Copy to clipboard

Data labels (IBM Cloud)

You can remove data associated with a specific customer if you label the data with a customer ID when you send a request to the service.

Use the X-Watson-Metadata header to associate a customer ID with the data. By adding a customer ID to a request, you indicate that it contains data that belongs to that customer.

Specify a random or generic string for the customer ID. Do not include personal data, such as an email address. Pass the string customer_id={id} as the argument of the header.

Labeling data is used only by methods that accept customer data.
Use the Delete labeled data method to remove data that is associated with a customer ID.

Use this process of labeling and deleting data only when you want to remove the data that is associated with a single customer, not when you want to remove data for multiple customers. For more information about Speech to Text and labeling data, see Information security.

For more information about how to pass headers, see Additional headers.

Data collection (IBM Cloud)

By default, Speech to Text service instances managed on IBM Cloud that are not part of Premium plans collect data about API requests and their results. This data is collected only to improve the services for future users. The collected data is not shared or made public. Data is not collected for services that are part of Premium plans.

To prevent IBM usage of your data for an API request, set the X-Watson-Learning-Opt-Out header parameter to true. You can also disable request logging at the account level. For more information, see Controlling request logging for Watson services.

You must set the header on each request that you do not want IBM to access for general service improvements.

You can set the header by using the setDefaultHeaders method of the service object.

You can set the header by using the headers parameter when you create the service object.

You can set the header by using the set_default_headers method of the service object.

You can set the header by using the add_default_headers method of the service object.

You can set the header by using the SetDefaultHeaders method of the service object.

You can set the header by adding it to the defaultHeaders property of the service object.

You can set the header by using the WithHeader() method of the service object.

Example request with a service managed on IBM Cloud

curl -u "apikey:{apikey}" -H "X-Watson-Learning-Opt-Out: true" "{url}/{method}"
Copy to clipboard

Example request with a service managed on IBM Cloud

Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");

speechToText.setDefaultHeaders(headers);
Copy to clipboard

Example request with a service managed on IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
  headers: {
    'X-Watson-Learning-Opt-Out': 'true'
  }
});
Copy to clipboard

Example request with a service managed on IBM Cloud

speech_to_text.set_default_headers({'x-watson-learning-opt-out': "true"})

Example request with a service managed on IBM Cloud

speech_to_text.add_default_headers(headers: {"x-watson-learning-opt-out" => "true"})
Copy to clipboard

Example request with a service managed on IBM Cloud

import "net/http"

headers := http.Header{}
headers.Add("x-watson-learning-opt-out", "true")
speechToText.SetDefaultHeaders(headers)
Copy to clipboard

Example request with a service managed on IBM Cloud

speechToText.defaultHeaders["X-Watson-Learning-Opt-Out"] = "true"
Copy to clipboard

Example request with a service managed on IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.WithHeader("X-Watson-Learning-Opt-Out", "true");
Copy to clipboard

Example request with a service managed on IBM Cloud

var authenticator = new IamAuthenticator(
    apikey: "{apikey}"
);

while (!authenticator.CanAuthenticate())
    yield return null;

var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

speechToText.WithHeader("X-Watson-Learning-Opt-Out", "true");
Copy to clipboard

Synchronous and asynchronous requests

The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the ServiceCall interface.

To call a method synchronously, use the execute method of the ServiceCall interface. You can call the execute method directly from an instance of the service.
To call a method asynchronously, use the enqueue method of the ServiceCall interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument provides onResponse and onFailure methods that you override to handle the callback.

The Ruby SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the Concurrent::Async module. When you use the synchronous or asynchronous methods, an IVar object is returned. You access the DetailedResponse object by calling ivar_object.value.

For more information about the Ivar object, see the IVar class docs.

To call a method synchronously, either call the method directly or use the .await chainable method of the Concurrent::Async module.

Calling a method directly (without .await) returns a DetailedResponse object.
To call a method asynchronously, use the .async chainable method of the Concurrent::Async module.

You can call the .await and .async methods directly from an instance of the service.

Example synchronous request

ReturnType returnValue = speechToText.method(parameters).execute();
Copy to clipboard

Example asynchronous request

speechToText.method(parameters).enqueue(new ServiceCallback<ReturnType>() {
  @Override public void onResponse(ReturnType response) {
    . . .
  }
  @Override public void onFailure(Exception e) {
    . . .
  }
});
Copy to clipboard

Example synchronous request

response = speech_to_text.method_name(parameters)

or

response = speech_to_text.await.method_name(parameters)

Example asynchronous request

response = speech_to_text.async.method_name(parameters)

Speech to Text docs
Release notes for IBM Cloud
Release notes for IBM Cloud Pak for Data
Javadoc for SpeechToText
Javadoc for sdk-core

WebSockets

Sends audio and returns transcription results for recognition requests over a WebSocket connection. Requests and responses are enabled over a single TCP connection that abstracts much of the complexity of the request to offer efficient implementation, low latency, high throughput, and an asynchronous response.

The endpoint for the WebSocket API is

wss://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}/v1/recognize

{location} indicates where your application is hosted:
- us-south for Dallas
- us-east for Washington, DC
- eu-de for Frankfurt
- au-syd for Sydney
- jp-tok for Tokyo
- eu-gb for London
- kr-seo for Seoul
{instance_id} indicates the unique identifier of the service instance. For more information about how to find the instance ID, see Access between services.

The examples in the documentation abbreviate wss://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id} to {ws_url}. So all WebSocket examples call the method as {ws_url}/v1/recognize.

You can pass a maximum of 100 MB and a minimum of 100 bytes of audio per recognition request. You can send multiple requests over a single WebSocket connection. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

By default, the service returns only final results for any request. You can request interim results to see intermediate hypotheses as the transcription progress.

See also:

The WebSocket interface cannot be called from curl. Use a client-side scripting language to call the interface. The example request uses JavaScript to invoke the WebSocket recognize method.

The createRecognizeStream method is deprecated. Use the equivalent recognizeUsingWebSocket method instead.

The recognize_with_websocket method is deprecated. Use the equivalent recognize_using_websocket method instead.

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the content-type contentType content_type parameter with the request to specify the format of the audio.
For all other formats, you can omit the content-type contentType content_type parameter or specify application/octet-stream with the parameter to have the service automatically detect the format of the audio.

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

application/octet-stream
audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

See also:

Supported audio formats

The Python recognize_using_websocket method requires the content_type parameter.

Large speech models and Next-generation models

The service supports large speech models and next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a large speech model or next-generation model by using the model parameter, as you do a previous-generation model. Only the next-generation models support the low_latency parameter, and all large speech models and next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see Migrating to large speech models.

See also:

URI /v1/recognize

okhttp3.WebSocket recognizeUsingWebSocket(RecognizeOptions options,
  RecognizeCallback callback)

RecognizeStream recognizeUsingWebSocket(params)

dict recognize_using_websocket(audio, content_type,
  recognize_callback, model=None,
  language_customization_id=None, acoustic_customization_id=None,
  customization_weight=None, base_model_version=None,
  inactivity_timeout=None, interim_results=None,
  keywords=None, keywords_threshold=None,
  max_alternatives=None, word_alternatives_threshold=None,
  word_confidence=None, timestamps=None, profanity_filter=None,
  smart_formatting=None, speaker_labels=None, http_proxy_host=None,
  http_proxy_port=None, customization_id=None, grammar_name=None,
  redaction=None, processing_metrics=None, processing_metrics_interval=None,
  audio_metrics=None, end_of_phrase_silence_time=None,
  split_transcript_at_phrase_end=None, speech_detector_sensitivity=None,
  background_audio_suppression=None, **kwargs)
Copy to clipboard

Request

The client calls the recognize method to obtain a string that contains the URI for the WebSocket interface. The call to the recognize method sets basic parameters for the connection and for all recognition requests that are sent over it. See the Parameters of recognize method table.

The client then establishes a connection with the service by passing the URI to the WebSocket constructor, which returns a WebSocket connection object. The client initiates and manages recognition requests by sending JSON-formatted text messages to the service over the connection. The text messages can include all other parameters of the recognition request. The required action parameter tells the service which action is to be performed. See the Parameters of WebSocket text messages table.

After sending the text message to initiate a request, the client sends the audio data to be transcribed as a binary message (blob) over the connection.

Parameters of recognize method

access_token

Required*
string
Pass a valid access token to establish an authenticated connection with the service. You must establish the connection before the access token expires. You pass an access token only to establish an authenticated connection. After you establish a connection, you can keep it alive indefinitely. You remain authenticated for as long as you keep the connection open. You do not need to refresh the access token for an active connection that lasts beyond the token's expiration time. After a connection is established, it can remain active even after the token or its credentials are deleted.
- IBM Cloud only. Pass an Identity and Access Management (IAM) access token to authenticate with the service. You pass an IAM access token instead of passing an API key with the call. For more information, see Authenticating to IBM Cloud.
- IBM Cloud Pak for Data only. Pass an access token as you would with the Authorization header of an HTTP request. For more information, see Authenticating to IBM Cloud Pak for Data.
model
string

The model to use for all speech recognition requests that are sent over the connection. See Using a model for speech recognition.

The default model is en-US_BroadbandModel. For Speech to Text for IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service. For more information, see Using the default model.

Allowable values: [ar-MS_BroadbandModel, ar-MS_Telephony, cs-CZ_Telephony, de-DE, de-DE_BroadbandModel, de-DE_Multimedia, de-DE_NarrowbandModel, de-DE_Telephony, en-AU, en-AU_BroadbandModel, en-AU_Multimedia, en-AU_NarrowbandModel, en-AU_Telephony, en-GB, en-GB_BroadbandModel, en-GB_Multimedia, en-GB_NarrowbandModel, en-GB_Telephony, en-IN, en-IN_Telephony, en-US, en-US_BroadbandModel, en-US_Multimedia, en-US_NarrowbandModel, en-US_ShortForm_NarrowbandModel, en-US_Telephony, en-WW_Medical_Telephony, es-AR, es-AR_BroadbandModel, es-AR_NarrowbandModel, es-CL, es-CL_BroadbandModel, es-CL_NarrowbandModel, es-CO, es-CO_BroadbandModel, es-CO_NarrowbandModel, es-ES, es-ES_BroadbandModel, es-ES_NarrowbandModel, es-ES_Multimedia, es-ES_Telephony, es-LA_Telephony, es-MX, es-MX_BroadbandModel, es-MX_NarrowbandModel, es-PE, es-PE_BroadbandModel, es-PE_NarrowbandModel, fr-CA, fr-CA_BroadbandModel, fr-CA_Multimedia, fr-CA_NarrowbandModel, fr-CA_Telephony, fr-FR, fr-FR_BroadbandModel, fr-FR_Multimedia, fr-FR_NarrowbandModel, fr-FR_Telephony, hi-IN_Telephony, it-IT_BroadbandModel, it-IT_NarrowbandModel, it-IT_Multimedia, it-IT_Telephony, ja-JP, ja-JP_BroadbandModel, ja-JP_Multimedia, ja-JP_NarrowbandModel, ja-JP_Telephony, ko-KR_BroadbandModel, ko-KR_Multimedia, ko-KR_NarrowbandModel, ko-KR_Telephony, nl-BE_Telephony, nl-NL_BroadbandModel, nl-NL_Multimedia, nl-NL_NarrowbandModel, nl-NL_Telephony, pt-BR, pt-BR_BroadbandModel, pt-BR_Multimedia, pt-BR_NarrowbandModel, pt-BR_Telephony, sv-SE_Telephony, zh-CN_BroadbandModel, zh-CN_NarrowbandModel, zh-CN_Telephony]

Default: en-US_BroadbandModel
language_customization_id
string

The customization ID (GUID) of a custom language model that is to be used for all requests sent over the connection. The base model of the specified custom language model must match the model that is specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See Using a custom language model for speech recognition.
acoustic_customization_id
string

The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model that is specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See Using a custom acoustic model for speech recognition.
base_model_version
string

The version of the specified base model that is to be used for all requests sent over the connection. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
x-watson-learning-opt-out
string

Indicates whether IBM can use data that is sent over the connection to improve the service for future users. Specify true to prevent IBM from accessing the logged data. See Data collection.

Default: false
x-watson-metadata
string

Associates a customer ID with all data that is passed over the connection. The parameter accepts the argument customer_id={id}, where {id} is a random or generic string that is to be associated with the data. URL-encode the argument to the parameter, for example customer_id%3dmy_ID. By default, no customer ID is associated with the data. See Data labels.

Call the recognizeUsingWebSocket method to initiate a recognition request. Use the recognizeOptions argument to pass a RecognizeOptions object that provides the parameters for the request, including the audio. Use the callback argument to pass a Java BaseRecognizeCallback object to handle events from the WebSocket connection.

Call the recognizeUsingWebSocket method to initiate a recognition request. The method returns a RecognizeStream object to which you pipe the audio that is to be transcribed. You also use the object's on method to define event handlers for the request. You pass all other parameters of the request as arguments of the method.

Call the recognize_using_websocket method to initiate a recognition request. Pass the audio and all parameters of the request, including the RecognizeCallback and AudioSource objects, as arguments of the method.

Parameters of WebSocket text messages

Parameters

action

Required*
string
The action that is to be performed.

Allowable values:
- start initiates a recognition request. The message can also include any other optional parameters that are described in this table. After sending this text message, the client sends the data as a binary message (blob).
  
  Between recognition requests, the client can send new start messages to modify the parameters that are to be used for subsequent requests. By default, the service continues to use the parameters that were specified with the previous start message.
- stop indicates that all audio data for the request has been sent to the service. The client can send additional requests with the same or different parameters.
objectMode
boolean
Indicates how the data event handler is to return the response from the service:
- If false, the event handler returns only a string with the final transcription of the recognition results, regardless of the parameters that you pass with the request. You must set the encoding for your instance of the RecognizeStream object to UTF-8 by including a call that is similar to the following line of code in your application:
  
  recognizeStream.setEncoding('utf8');
  
  Do not include this call if you set the objectMode parameter to true.
- If true, the event handler returns the recognition results exactly as it receives them from the service: as one or more instances of a SpeechRecognitionResults object.
For more information, see the Example request for the method.
audio

Required*
java.io.InputStream AudioSource

The audio that is to be transcribed.

An AudioSource object that provides the audio that is to be transcribed.
content-type

contentType

content_type

Required*
string str

The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream, audio/alaw, audio/basic, audio/flac, audio/g729, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis]
callback

recognize_callback

Required*
object

A BaseRecognizeCallback object that implements the RecognizeCallback interface to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.

A RecognizeCallback object that defines methods to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.
model
string str

The model to use for all speech recognition requests that are sent over the connection. See Using a model for speech recognition.

The default model is en-US_BroadbandModel. For Speech to Text for IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service. For more information, see Using the default model.

Allowable values: [ar-MS_BroadbandModel, ar-MS_Telephony, cs-CZ_Telephony, de-DE, de-DE_BroadbandModel, de-DE_Multimedia, de-DE_NarrowbandModel, de-DE_Telephony, en-AU, en-AU_BroadbandModel, en-AU_Multimedia, en-AU_NarrowbandModel, en-AU_Telephony, en-GB, en-GB_BroadbandModel, en-GB_Multimedia, en-GB_NarrowbandModel, en-GB_Telephony, en-IN, en-IN_Telephony, en-US, en-US_BroadbandModel, en-US_Multimedia, en-US_NarrowbandModel, en-US_ShortForm_NarrowbandModel, en-US_Telephony, en-WW_Medical_Telephony, es-AR, es-AR_BroadbandModel, es-AR_NarrowbandModel, es-CL, es-CL_BroadbandModel, es-CL_NarrowbandModel, es-CO, es-CO_BroadbandModel, es-CO_NarrowbandModel, es-ES, es-ES_BroadbandModel, es-ES_NarrowbandModel, es-ES_Multimedia, es-ES_Telephony, es-LA_Telephony, es-MX, es-MX_BroadbandModel, es-MX_NarrowbandModel, es-PE, es-PE_BroadbandModel, es-PE_NarrowbandModel, fr-CA, fr-CA_BroadbandModel, fr-CA_Multimedia, fr-CA_NarrowbandModel, fr-CA_Telephony, fr-FR, fr-FR_BroadbandModel, fr-FR_Multimedia, fr-FR_NarrowbandModel, fr-FR_Telephony, hi-IN_Telephony, it-IT_BroadbandModel, it-IT_NarrowbandModel, it-IT_Multimedia, it-IT_Telephony, ja-JP, ja-JP_BroadbandModel, ja-JP_Multimedia, ja-JP_NarrowbandModel, ja-JP_Telephony, ko-KR_BroadbandModel, ko-KR_Multimedia, ko-KR_NarrowbandModel, ko-KR_Telephony, nl-BE_Telephony, nl-NL_BroadbandModel, nl-NL_Multimedia, nl-NL_NarrowbandModel, nl-NL_Telephony, pt-BR, pt-BR_BroadbandModel, pt-BR_Multimedia, pt-BR_NarrowbandModel, pt-BR_Telephony, sv-SE_Telephony, zh-CN_BroadbandModel, zh-CN_NarrowbandModel, zh-CN_Telephony]

Default: en-US_BroadbandModel
languageCustomizationId

language_customization_id
string str

The customization ID (GUID) of a custom language model that is to be used for the request. The base model of the specified custom language model must match the model that is specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See Using a custom language model for speech recognition.
acousticCustomizationId

acoustic_customization_id
string str

The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model that is specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See Using a custom acoustic model for speech recognition.
customization_weight

customizationWeight
double float
If you specify a customization ID when you open the connection, If you specify a customization ID, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.5 for large speech models
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when you set the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
base_model_version

baseModelVersion
string str

The version of the specified base model that is to be used for the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
inactivity_timeout

inactivityTimeout
integer int

The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is closed. The default is 30 seconds. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
interim_results

interimResults
boolean bool
If true, the service returns intermediate hypotheses as a stream of JSON SpeechRecognitionResults objects before returning final results for an utterance. If false, the service returns only a single SpeechRecognitionResults object with final results for any utterance. (See the objectMode parameter for information about controlling the response from the method.)
- For previous-generation models, interim results are available for all models. To receive interim results, set the interim_results interimResults parameter to true.
- For next-generation models, interim results are available only for those models that support low latency. To receive interim results, see both the interim_results interimResults and low_latency lowLatency parameters to true.
For for information, see:
Default: false
keywords
string[] [string] list[str]

An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywords_threshold

keywordsThreshold
float

A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords. See Keyword spotting.
max_alternatives

maxAlternatives
integer int

The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
word_alternatives_threshold

wordAlternativesThreshold
float

A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
word_confidence

wordConfidence
boolean bool

If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, no word confidence measures are returned. See Word confidence.

Default: false
timestamps
boolean bool

If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanity_filter

profanityFilter
boolean bool

If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smart_formatting

smartFormatting
boolean bool

If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, no smart formatting is performed.

Beta: The parameter is beta functionality. It can be used with US English, Japanese, and Spanish (all dialects) transcription only. See Smart formatting.

Default: false
smart_formatting_version

smartFormattingVersion
integer int

Smart formatting version for large speech models and next-generation models is supported in US English, Brazilian Portuguese, French, German, Spanish and French Canadian languages.

See Smart formatting Version.

Default: 0
speaker_labels

speakerLabels
boolean bool
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels speakerLabels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For large speech models and next-generation models, can be used with all available languages.
See Speaker labels.

Default: false
http_proxy_host
str

If you are passing requests through a proxy, specify the hostname of the proxy server. Use the http_proxy_port parameter to specify the port number at which the proxy listens. Omit both parameters if you are not using a proxy.

Default: None
http_proxy_port
str

If you are passing requests through a proxy, specify the port number at which the proxy service listens. Use the http_proxy_host parameter to specify the hostname of the proxy. Omit both parameters if you are not using a proxy.

Default: None
grammarName

grammar_name
string str

The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id languageCustomizationId parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.
redaction
boolean bool

If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold keywordsThreshold parameters) and returns only a single final transcript (forces the max_alternatives maxAlternatives parameter to be 1).

Beta: The parameter is beta functionality. It can be used with US English, Japanese, and Korean transcription only. See Numeric redaction.

Default: false
processing_metrics

processingMetrics
boolean bool

If true, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval that is specified by the processing_metrics_interval processingMetricsInterval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics. See Processing metrics.

Default: false
processing_metrics_interval

processingMetricsInterval
float

Specifies the interval in seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics processingMetrics parameter is set to true.

The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

See Processing metrics.

Default: 1.0
audio_metrics

audioMetrics
boolean bool

If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics. See Audio metrics.

Default: false
end_of_phrase_silence_time

endOfPhraseSilenceTime
double float
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds. The default for Chinese is 0.6 seconds.

See End of phrase silence time.

Default: 0.8
split_transcript_at_phrase_end

splitTranscriptAtPhraseEnd
boolean bool

If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speech_detector_sensitivity

speechDetectorSensitivity
double float
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.

Default: 0.5
background_audio_suppression

backgroundAudioSuppression
double float
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value between 0.0 and 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Background audio suppression and Language model support.

Default: 0.0
low_latency

lowLatency
boolean bool
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

Note: The low_latency lowLatency parameter is not available for large speech models and previous-generation Broadband and Narrowband models. It is available only for some next-generation models. To obtain interim results with a next-generation model, the model must support low latency and both the interim_results interimResults and low_latency lowLatency parameters must be set to true.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency lowLatency parameter, see Low latency.
Default: false
character_insertion_bias
float
For large speech models and next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

Beta: The parameter is beta functionality. It is not available for previous-generation models.

See Character insertion bias.",

Example request

var access_token = '{access_token}';
var wsURI = '{ws_url}/v1/recognize'
  + '?access_token=' + access_token
  + '&model=en-US_BroadbandModel';

var websocket = new WebSocket(wsURI);
websocket.onopen = function(evt) { onOpen(evt) };
websocket.onclose = function(evt) { onClose(evt) };
websocket.onmessage = function(evt) { onMessage(evt) };
websocket.onerror = function(evt) { onError(evt) };

function onOpen(evt) {
  var message = {
    action: 'start',
    keywords: ['colorado', 'tornado', 'tornadoes'],
    keywords_threshold: 0.5,
    max-alternatives: 3
  };
  websocket.send(JSON.stringify(message));

  // Prepare and send the audio file.
  websocket.send(blob);

  websocket.send(JSON.stringify({action: 'stop'}));
}

function onClose(evt) {
  console.log(evt.data);
}

function onMessage(evt) {
  console.log(evt.data);
}

function onError(evt) {
  console.log(evt.data);
}
Copy to clipboard

Example request

/* * * * *
 * IBM CLOUD: Use the following code only to
 * authenticate to IBM Cloud.
 * * * * */

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

/* * * * *
 * IBM CLOUD PAK FOR DATA: Use the following code
 * only to authenticate to IBM Cloud Pak for Data.
 * * * * */

// CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}", "{username}", "{password}");
// SpeechToText speechToText = new SpeechToText(authenticator);
// speechToText.setServiceUrl("{url}");

try {
  RecognizeOptions recognizeOptions = new RecognizeOptions.Builder()
    .audio(new FileInputStream("audio-file.flac"))
    .contentType("audio/flac")
    .model("en-US_BroadbandModel")
    .keywords(Arrays.asList("colorado", "tornado", "tornadoes"))
    .keywordsThreshold((float) 0.5)
    .maxAlternatives(3)
    .build();

  BaseRecognizeCallback baseRecognizeCallback =
    new BaseRecognizeCallback() {

      @Override
      public void onTranscription
        (SpeechRecognitionResults speechRecognitionResults) {
          System.out.println(speechRecognitionResults);
      }

      @Override
      public void onDisconnected() {
        System.exit(0);
      }

    };

  speechToText.recognizeUsingWebSocket(recognizeOptions,
    baseRecognizeCallback);
} catch (FileNotFoundException e) {
  e.printStackTrace();
}
Copy to clipboard

Example request

const fs = require('fs');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');

/* * * * *
 * IBM CLOUD: Use the following code only to
 * authenticate to IBM Cloud.
 * * * * */

const { IamAuthenticator } = require('ibm-watson/auth');
const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

/* * * * *
 * IBM CLOUD PAK FOR DATA: Use the following code
 * only to authenticate to IBM Cloud Pak for Data.
 * * * * */

// const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');
// const speechToText = new SpeechToTextV1({
//   authenticator: new CloudPakForDataAuthenticator({
//     username: '{username}',
//     password: '{password}',
//     url: 'https://{cpd_cluster_host}{:port}',
//  }),
//  serviceUrl: '{url}',
// });

const params = {
  objectMode: true,
  contentType: 'audio/flac',
  model: 'en-US_BroadbandModel',
  keywords: ['colorado', 'tornado', 'tornadoes'],
  keywordsThreshold: 0.5,
  maxAlternatives: 3,
};

// Create the stream.
const recognizeStream = speechToText.recognizeUsingWebSocket(params);

// Pipe in the audio.
fs.createReadStream('audio-file.flac').pipe(recognizeStream);

/*
 * Uncomment the following two lines of code ONLY if `objectMode` is `false`.
 *
 * WHEN USED TOGETHER, the two lines pipe the final transcript to the named
 * file and produce it on the console.
 *
 * WHEN USED ALONE, the following line pipes just the final transcript to
 * the named file but produces numeric values rather than strings on the
 * console.
 */
// recognizeStream.pipe(fs.createWriteStream('transcription.txt'));

/*
 * WHEN USED ALONE, the following line produces just the final transcript
 * on the console.
 */
// recognizeStream.setEncoding('utf8');

// Listen for events.
recognizeStream.on('data', function(event) { onEvent('Data:', event); });
recognizeStream.on('error', function(event) { onEvent('Error:', event); });
recognizeStream.on('close', function(event) { onEvent('Close:', event); });

// Display events on the console.
function onEvent(name, event) {
    console.log(name, JSON.stringify(event, null, 2));
};
Copy to clipboard

Example request

import json
from os.path import join, dirname
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback, AudioSource

##########
# IBM CLOUD: Use the following code only to
# authenticate to IBM Cloud.
##########

from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)
speech_to_text.set_service_url('{url}')

##########
# IBM CLOUD PAK FOR DATA: Use the following code
# only to authenticate to IBM Cloud Pak for Data.
##########

# from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
# authenticator = CloudPakForDataAuthenticator(
#     '{username}',
#     '{password}',
#     'https://{cpd_cluster_host}{:port}'
# )
# speech_to_text = SpeechToTextV1(
#     authenticator=authenticator
# )
# speech_to_text.set_service_url('{url}')

class MyRecognizeCallback(RecognizeCallback):
    def __init__(self):
        RecognizeCallback.__init__(self)

    def on_data(self, data):
        print(json.dumps(data, indent=2))

    def on_error(self, error):
        print('Error received: {}'.format(error))

    def on_inactivity_timeout(self, error):
        print('Inactivity timeout: {}'.format(error))

myRecognizeCallback = MyRecognizeCallback()

with open(join(dirname(__file__), './.', 'audio-file.flac'),
              'rb') as audio_file:
    audio_source = AudioSource(audio_file)
    speech_to_text.recognize_using_websocket(
        audio=audio_source,
        content_type='audio/flac',
        recognize_callback=myRecognizeCallback,
        model='en-US_BroadbandModel',
        keywords=['colorado', 'tornado', 'tornadoes'],
        keywords_threshold=0.5,
        max_alternatives=3)
Copy to clipboard

Response

Successful recognition returns one or more instances of a SpeechRecognitionResults object. The contents of the response depend on the parameters you send with the recognition request, including the interim_results interimResults parameter. For more information, see the results for the Recognize audio method.

If the objectMode parameter is true, successful recognition returns one or more instances of a SpeechRecognitionResults object. The contents of the response depend on the parameters you send with the recognition request, including the interimResults parameter. For more information, see the results for the Recognize audio method.

If the objectMode parameter is false, successful recognition returns only a single string with the final transcription results.

Response handling

Response handling for the WebSocket interface is different from HTTP response handling. The WebSocket constructor returns an instance of a WebSocket connection object. You assign application-specific calls to the following methods of the object to handle events that are associated with the connection. Each event handler must accept a single argument for an event from the connection. The event that it accepts causes it to execute.

Methods

onopen({event})

The status of the connection's opening.
onmessage({event})

Response messages from the service, including the results of the request as one or more JSON SpeechRecognitionResults objects.
onerror({event})

Errors for the connection or request.
onclose({event})

The status of the connection's closing.

The callback parameter of the recognizeUsingWebSocket method accepts a Java object of type BaseRecognizeCallback, which implements the RecognizeCallback interface to handle events from the WebSocket connection. You override the definitions of the following default empty methods of the object to handle events that are associated with the connection and the request. The methods are called when their associated events occur.

Methods

void onConnected()

The WebSocket connection is established.
void onListening()

The service is listening for audio.
void onTranscription (SpeechRecognitionResults {speechRecognitionResults})

Results for the request are received from the service.
void onTranscriptionComplete()

Final results for the request have been returned by the service.
void onError(java.lang.Exception {e})

An error occurs in the WebSocket connection.
void onInactivityTimeout (java.lang.RuntimeException {runtimeException})

An inactivity timeout occurs for the request.
void onDisconnected()

The WebSocket connection is closed.

You handle events that are associated with the WebSocket connection and the request by defining event-handler methods on the RecognizeCallback object that is returned by the recognizeUsingWebSocket method. The methods are called when their associated events occur. You can define handlers for the following events by using the object's on method. For more information about streams and events, see the Node.js documentation.

Events

data

Results for the request are received on the stream.
readable

Data is available to be read from the stream.
end

No data remains to be read from the stream.
close

The WebSocket connection is closed.
error

An error occurs in the WebSocket connection.

The recognize_callback parameter of the recognize_using_websocket method accepts an object of type RecognizeCallback. The object defines the methods that handle events from the WebSocket connection. You can override the definitions of the following default empty methods of the object to handle events that are associated with the connection and the request. The methods are called when their associated events occur.

Methods

on_connected()

The WebSocket connection is established.
on_listening()

The service is listening for audio.
on_data({data})

Returns all response data for the request from the service.
on_hypothesis({hypothesis})

Returns interim results or maximum alternatives from the service when those responses are requested.
on_transcription({transcript})

Returns final transcription results for the request from the service.
on_error({error})

Reports an error in the WebSocket connection.
on_inactivity_timeout({error})

Reports an inactivity timeout for the request.

The connection can produce the following return codes.

Return code

1000

The connection closed normally.
1001

The connection closed because the remote peer is leaving.
1002

The connection closed due to a protocol error.
1003

The connection closed because the service could not process the input from the client.
1004

Reserved response code.
1005

The connection closed for a reason other than those defined by the remaining return codes.
1006

The connection closed abnormally.
1007

The connection closed because the service received invalid data.
1008

The connection closed due to a policy violation.
1009

The connection closed because the frame size exceeded the 4 MB limit.
1010

The connection closed because the client requested a required extension that is not available.
1011

The connection closed because the service encountered an unexpected internal condition that prevents it from fulfilling the request.
1015

The connection was not established due to a TLS handshake error.

Example response

{
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
          "confidence": 0.89
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "end_time": 2.15,
            "confidence": 1.0
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.95,
            "end_time": 5.59,
            "confidence": 0.98
          }
        ]
      }
    }
  ],
  "result_index": 0
}
Copy to clipboard

Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.

See also: Listing all models.

Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.

See also: Listing all models.

Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.

See also: Listing all models.

Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.

See also: Listing all models.

Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.

See also: Listing all models.

GET /v1/models

ListModels()

ServiceCall<SpeechModels> listModels()
Copy to clipboard

listModels(params)

list_models(
        self,
        **kwargs,
    ) -> DetailedResponse

Request

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

Example request for IBM Cloud

curl -X GET -u "apikey:{apikey}" "{url}/v1/models"

Example request for IBM Cloud Pak for Data

curl -X GET --header "Authorization: Bearer {token}" "{url}/v1/models"

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.ListModels();

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.ListModels();

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

SpeechModels speechModels = speechToText.listModels().execute().getResult();
System.out.println(speechModels);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

SpeechModels speechModels = speechToText.listModels().execute().getResult();
System.out.println(speechModels);
Copy to clipboard

Example request for IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

speechToText.listModels()
  .then(speechModels => {
    console.log(JSON.stringify(speechModels, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

speechToText.listModels()
  .then(speechModels => {
    console.log(JSON.stringify(speechModels, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_models = speech_to_text.list_models().get_result()
print(json.dumps(speech_models, indent=2))
Copy to clipboard

Example request for IBM Cloud Pak for Data

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_models = speech_to_text.list_models().get_result()
print(json.dumps(speech_models, indent=2))
Copy to clipboard

Response

Response Body

SpeechModels

Information about the available language models.

SpeechModels

Information about the available language models.

SpeechModels

Information about the available language models.

SpeechModels

Information about the available language models.

SpeechModels

Information about the available language models.

Status Code

200
OK. The request succeeded.
406
Not Acceptable. The request specified an Accept header with an incompatible content type.
415
Unsupported Media Type. The request specified an unacceptable media type.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 200

{
  "models": [
    {
      "name": "pt-BR_NarrowbandModel",
      "language": "pt-BR",
      "url": "{url}/v1/models/pt-BR_NarrowbandModel",
      "rate": 8000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "Brazilian Portuguese narrowband model."
    },
    {
      "name": "ko-KR_BroadbandModel",
      "language": "ko-KR",
      "url": "{url}/models/ko-KR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "Korean broadband model."
    },
    {
      "name": "fr-FR_BroadbandModel",
      "language": "fr-FR",
      "url": "{url}/v1/models/fr-FR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "French broadband model."
    }
  ]
}
Copy to clipboard

Success example

{
  "models": [
    {
      "name": "pt-BR_NarrowbandModel",
      "language": "pt-BR",
      "url": "{url}/v1/models/pt-BR_NarrowbandModel",
      "rate": 8000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "Brazilian Portuguese narrowband model."
    },
    {
      "name": "ko-KR_BroadbandModel",
      "language": "ko-KR",
      "url": "{url}/models/ko-KR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "Korean broadband model."
    },
    {
      "name": "fr-FR_BroadbandModel",
      "language": "fr-FR",
      "url": "{url}/v1/models/fr-FR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": true,
        "custom_acoustic_model": true,
        "speaker_labels": true
      },
      "description": "French broadband model."
    }
  ]
}
Copy to clipboard

Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

See also: Listing a specific model.

Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

See also: Listing a specific model.

Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

See also: Listing a specific model.

Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

See also: Listing a specific model.

Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.

See also: Listing a specific model.

GET /v1/models/{model_id}

GetModel(string modelId)

ServiceCall<SpeechModel> getModel(GetModelOptions getModelOptions)
Copy to clipboard

getModel(params)

get_model(
        self,
        model_id: str,
        **kwargs,
    ) -> DetailedResponse

Request

Use the GetModelOptions.Builder to create a GetModelOptions object that contains the parameter values for the getModel method.

Path Parameters

model_id
Required*
string
The identifier of the model in the form of its name from the output of the List models method.

Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-GB,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-IN,en-IN_Telephony,en-US,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]

parameters

modelId
Required*
string
The identifier of the model in the form of its name from the output of the List models method.

Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-IN_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]

GetModelOptions

The getModel options.

parameters

modelId
Required*
string
The identifier of the model in the form of its name from the output of the List models method.

Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-IN_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]

parameters

model_id
Required*
str
The identifier of the model in the form of its name from the output of the List models method.

Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-IN_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]

Example request for IBM Cloud

curl -X GET -u "apikey:{apikey}" "{url}/v1/models/en-US_BroadbandModel"

Example request for IBM Cloud Pak for Data

curl -X GET --header "Authorization: Bearer {token}" "{url}/v1/models/en-US_BroadbandModel"

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.GetModel(
    modelId: "en-US_BroadbandModel"
    );

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.GetModel(
    modelId: "en-US_BroadbandModel"
    );

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

GetModelOptions getModelOptions = new GetModelOptions.Builder()
  .modelId("en-US_BroadbandModel")
  .build();

SpeechModel speechModel = speechToText.getModel(getModelOptions).execute().getResult();
System.out.println(speechModel);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

GetModelOptions getModelOptions = new GetModelOptions.Builder()
  .modelId("en-US_BroadbandModel")
  .build();

SpeechModel speechModel = speechToText.getModel(getModelOptions).execute().getResult();
System.out.println(speechModel);
Copy to clipboard

Example request for IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

const getModelParams = {
  modelId: 'en-US_BroadbandModel',
};

speechToText.getModel(getModelParams)
  .then(speechModel => {
    console.log(JSON.stringify(speechModel, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

const getModelParams = {
  modelId: 'en-US_BroadbandModel',
};

speechToText.getModel(getModelParams)
  .then(speechModel => {
    console.log(JSON.stringify(speechModel, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_model = speech_to_text.get_model('en-US_BroadbandModel').get_result()
print(json.dumps(speech_model, indent=2))
Copy to clipboard

Example request for IBM Cloud Pak for Data

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_model = speech_to_text.get_model('en-US_BroadbandModel').get_result()
print(json.dumps(speech_model, indent=2))
Copy to clipboard

Response

Response Body

SpeechModel

Information about an available language model.

SpeechModel

Information about an available language model.

SpeechModel

Information about an available language model.

SpeechModel

Information about an available language model.

SpeechModel

Information about an available language model.

Status Code

200
OK. The request succeeded.
404
Not Found. The specified model_id was not found.
406
Not Acceptable. The request specified an Accept header with an incompatible content type.
415
Unsupported Media Type. The request specified an unacceptable media type.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 200

{
  "rate": 16000,
  "name": "en-US_BroadbandModel",
  "language": "en-US",
  "url": "{url}/v1/models/en-US_BroadbandModel",
  "supported_features": {
    "custom_language_model": true,
    "custom_acoustic_model": true,
    "speaker_labels": true
  },
  "description": "US English broadband model."
}
Copy to clipboard

Success example

{
  "rate": 16000,
  "name": "en-US_BroadbandModel",
  "language": "en-US",
  "url": "{url}/v1/models/en-US_BroadbandModel",
  "supported_features": {
    "custom_language_model": true,
    "custom_acoustic_model": true,
    "speaker_labels": true
  },
  "description": "US English broadband model."
}
Copy to clipboard

Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Making a basic HTTP request.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Large speech models and Next-generation models

The service supports large speech models and next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a large speech model or next-generation model by using the model query parameter, as you do a previous-generation model. Only the next-generation models support the low_latency parameter, and all large speech models and next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see Migrating to large speech models.

See also:

Multipart speech recognition

Note: The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.

The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.

See also: Making a multipart HTTP request.

Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Making a basic HTTP request.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Multipart speech recognition

Note: The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.

The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.

See also: Making a multipart HTTP request.

Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Making a basic HTTP request.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Multipart speech recognition

Note: The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.

The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.

See also: Making a multipart HTTP request.

Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Making a basic HTTP request.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Multipart speech recognition

Note: The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.

The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.

See also: Making a multipart HTTP request.

Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Making a basic HTTP request.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Multipart speech recognition

Note: The asynchronous HTTP interface, WebSocket interface, and Watson SDKs do not support multipart speech recognition.

The HTTP POST method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.

See also: Making a multipart HTTP request.

POST /v1/recognize

Recognize(System.IO.MemoryStream audio, string contentType = null, string model = null, string languageCustomizationId = null, string acousticCustomizationId = null, string baseModelVersion = null, double? customizationWeight = null, long? inactivityTimeout = null, List<string> keywords = null, float? keywordsThreshold = null, long? maxAlternatives = null, float? wordAlternativesThreshold = null, bool? wordConfidence = null, bool? timestamps = null, bool? profanityFilter = null, bool? smartFormatting = null, bool? speakerLabels = null, string grammarName = null, bool? redaction = null, bool? audioMetrics = null, double? endOfPhraseSilenceTime = null, bool? splitTranscriptAtPhraseEnd = null, float? speechDetectorSensitivity = null, float? backgroundAudioSuppression = null, bool? lowLatency = null, float? characterInsertionBias = null)
Copy to clipboard

ServiceCall<SpeechRecognitionResults> recognize(RecognizeOptions recognizeOptions)
Copy to clipboard

recognize(params)

recognize(
        self,
        audio: BinaryIO,
        *,
        content_type: str = None,
        model: str = None,
        language_customization_id: str = None,
        acoustic_customization_id: str = None,
        base_model_version: str = None,
        customization_weight: float = None,
        inactivity_timeout: int = None,
        keywords: List[str] = None,
        keywords_threshold: float = None,
        max_alternatives: int = None,
        word_alternatives_threshold: float = None,
        word_confidence: bool = None,
        timestamps: bool = None,
        profanity_filter: bool = None,
        smart_formatting: bool = None,
        speaker_labels: bool = None,
        grammar_name: str = None,
        redaction: bool = None,
        audio_metrics: bool = None,
        end_of_phrase_silence_time: float = None,
        split_transcript_at_phrase_end: bool = None,
        speech_detector_sensitivity: float = None,
        background_audio_suppression: float = None,
        low_latency: bool = None,
        character_insertion_bias: float = None,
        **kwargs,
    ) -> DetailedResponse
Copy to clipboard

Request

Use the RecognizeOptions.Builder to create a RecognizeOptions object that contains the parameter values for the recognize method.

Custom Headers

Transfer-Encoding
string
Set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service. See Audio transmission.

Allowable values: [chunked]
Content-Type
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]

Query Parameters

model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN,en-IN_Telephony,en-GB,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
speech_begin_event
boolean
If true, the service returns a response object SpeechActivity which contains the time when a speech activity is detected in the stream. This can be used both in standard and low latency mode. This feature enables client applications to know that some words/speech has been detected and the service is in the process of decoding. This can be used in lieu of interim results in standard mode. See Using speech recognition parameters

Default: false
language_customization_id
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acoustic_customization_id
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
base_model_version
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customization_weight
double
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.5 for large speech models
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivity_timeout
int32
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
string[]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywords_threshold
float
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
max_alternatives
int32
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
word_alternatives_threshold
float
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
word_confidence
boolean
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
boolean
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanity_filter
boolean
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smart_formatting
boolean
Beta
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
smart_formatting_version
integer
Smart formatting version for large speech models and next-generation models is supported in US English, Brazilian Portuguese, French, German, Spanish and French Canadian languages.

Default: 0
speaker_labels
boolean
Beta
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For large speech models and next-generation models, the parameter can be used with all available languages.
See Speaker labels.
Default: false
grammar_name
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
boolean
Beta
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
audio_metrics
boolean
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
end_of_phrase_silence_time
double
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
split_transcript_at_phrase_end
boolean
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speech_detector_sensitivity
float
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
background_audio_suppression
float
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0
low_latency
boolean
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for large speech models and previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
character_insertion_bias
float
Beta
For large speech models and next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0

Request Body

Required*

application/octet-streambinary

The audio to transcribe.

parameters

audio
Required*
System.IO.MemoryStream
The audio to transcribe.
contentType
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
languageCustomizationId
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acousticCustomizationId
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
baseModelVersion
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customizationWeight
double?
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivityTimeout
long?
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
List<string>
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywordsThreshold
float?
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
maxAlternatives
long?
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
wordAlternativesThreshold
float?
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
wordConfidence
bool?
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
bool?
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanityFilter
bool?
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smartFormatting
bool?
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speakerLabels
bool?
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammarName
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
bool?
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
audioMetrics
bool?
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
endOfPhraseSilenceTime
double?
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
splitTranscriptAtPhraseEnd
bool?
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speechDetectorSensitivity
float?
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
backgroundAudioSuppression
float?
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
lowLatency
bool?
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
characterInsertionBias
float?
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

RecognizeOptions

The recognize options.

parameters

audio
Required*
NodeJS.ReadableStream
The audio to transcribe.
contentType
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
languageCustomizationId
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acousticCustomizationId
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
baseModelVersion
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customizationWeight
number
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivityTimeout
number
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
string[]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywordsThreshold
number
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
maxAlternatives
number
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
wordAlternativesThreshold
number
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
wordConfidence
boolean
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
boolean
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanityFilter
boolean
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smartFormatting
boolean
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speakerLabels
boolean
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammarName
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
boolean
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
audioMetrics
boolean
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
endOfPhraseSilenceTime
number
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
splitTranscriptAtPhraseEnd
boolean
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speechDetectorSensitivity
number
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
backgroundAudioSuppression
number
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
lowLatency
boolean
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
characterInsertionBias
number
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

parameters

audio
Required*
BinaryIO
The audio to transcribe.
content_type
str
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
str
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
language_customization_id
str
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acoustic_customization_id
str
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
base_model_version
str
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customization_weight
float
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivity_timeout
int
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
List[str]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywords_threshold
float
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
max_alternatives
int
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
word_alternatives_threshold
float
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
word_confidence
bool
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
bool
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanity_filter
bool
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smart_formatting
bool
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speaker_labels
bool
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammar_name
str
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
bool
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
audio_metrics
bool
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
end_of_phrase_silence_time
float
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
split_transcript_at_phrase_end
bool
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speech_detector_sensitivity
float
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
background_audio_suppression
float
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
low_latency
bool
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
character_insertion_bias
float
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

Example request for IBM Cloud

curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary @audio-file2.flac "{url}/v1/recognize?word_alternatives_threshold=0.9&keywords=colorado%2Ctornado%2Ctornadoes&keywords_threshold=0.5"
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud Pak for Data

curl -X POST --header "Authorization: Bearer {token}" --header "Content-Type: audio/flac" --data-binary @audio-file2.flac "{url}/v1/recognize?word_alternatives_threshold=0.9&keywords=colorado%2Ctornado%2Ctornadoes&keywords_threshold=0.5"
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.Recognize(
    audio: new MemoryStream(File.ReadAllBytes("audio-file2.flac")),
    contentType: "audio/flac",
    wordAlternativesThreshold: 0.9f,
    keywords: new List<string>()
    {
        "colorado",
        "tornado",
        "tornadoes"
    },
    keywordsThreshold: 0.5f
    );

Console.WriteLine(result.Response);
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.Recognize(
    audio: new MemoryStream(File.ReadAllBytes("audio-file2.flac")),
    contentType: "audio/flac",
    wordAlternativesThreshold: 0.9f,
    keywords: new List<string>()
    {
        "colorado",
        "tornado",
        "tornadoes"
    },
    keywordsThreshold: 0.5f
    );

Console.WriteLine(result.Response);
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

try {
  RecognizeOptions recognizeOptions = new RecognizeOptions.Builder()
    .audio(new FileInputStream("audio-file2.flac"))
    .contentType("audio/flac")
    .wordAlternativesThreshold((float) 0.9)
    .keywords(Arrays.asList("colorado", "tornado", "tornadoes"))
    .keywordsThreshold((float) 0.5)
    .build();
  
  SpeechRecognitionResults speechRecognitionResults =
    speechToText.recognize(recognizeOptions).execute().getResult();
  System.out.println(speechRecognitionResults);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  }
}
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

try {
  RecognizeOptions recognizeOptions = new RecognizeOptions.Builder()
    .audio(new FileInputStream("audio-file2.flac"))
    .contentType("audio/flac")
    .wordAlternativesThreshold((float) 0.9)
    .keywords(Arrays.asList("colorado", "tornado", "tornadoes"))
    .keywordsThreshold((float) 0.5)
    .build();
  
  SpeechRecognitionResults speechRecognitionResults =
    speechToText.recognize(recognizeOptions).execute().getResult();
  System.out.println(speechRecognitionResults);
  } catch (FileNotFoundException e) {
    e.printStackTrace();
  }
}
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud

const fs = require('fs');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

const recognizeParams = {
  audio: fs.createReadStream('audio-file2.flac'),
  contentType: 'audio/flac',
  wordAlternativesThreshold: 0.9,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  keywordsThreshold: 0.5,
};

speechToText.recognize(recognizeParams)
  .then(speechRecognitionResults => {
    console.log(JSON.stringify(speechRecognitionResults, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud Pak for Data

const fs = require('fs');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

const recognizeParams = {
  audio: fs.createReadStream('audio-file2.flac'),
  contentType: 'audio/flac',
  wordAlternativesThreshold: 0.9,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  keywordsThreshold: 0.5,
};

speechToText.recognize(recognizeParams)
  .then(speechRecognitionResults => {
    console.log(JSON.stringify(speechRecognitionResults, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud

from os.path import join, dirname
import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

with open(join(dirname(__file__), './.', 'audio-file2.flac'),
               'rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/flac',
        word_alternatives_threshold=0.9,
        keywords=['colorado', 'tornado', 'tornadoes'],
        keywords_threshold=0.5
    ).get_result()
print(json.dumps(speech_recognition_results, indent=2))
Copy to clipboard

Download sample file audio-file2.flac

Example request for IBM Cloud Pak for Data

from os.path import join, dirname
import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

with open(join(dirname(__file__), './.', 'audio-file2.flac'),
               'rb') as audio_file:
    speech_recognition_results = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/flac',
        word_alternatives_threshold=0.9,
        keywords=['colorado', 'tornado', 'tornadoes'],
        keywords_threshold=0.5
    ).get_result()
print(json.dumps(speech_recognition_results, indent=2))
Copy to clipboard

Download sample file audio-file2.flac

Response

Response Body

SpeechRecognitionResults

The complete results for a speech recognition request.

SpeechRecognitionResults

The complete results for a speech recognition request.

SpeechRecognitionResults

The complete results for a speech recognition request.

SpeechRecognitionResults

The complete results for a speech recognition request.

SpeechRecognitionResults

The complete results for a speech recognition request.

Status Code

200
OK. The request succeeded.
400
Bad Request. The request failed because of a user input error. For example, the request passed audio that does not match the indicated format or failed to specify a required audio format; specified a custom language or custom acoustic model that is not in the available state; or experienced an inactivity timeout. Specific messages include
- Model {model} not found
- Requested model is not available
- This 8000hz audio input requires a narrow band model. See /v1/models for a listp of available models.
- speaker_labels is not a supported feature for model {model}
- keywords_threshold value must be between zero and one (inclusive)
- word_alternatives_threshold value must be between zero and one (inclusive)
- You cannot specify both 'customization_id' and 'language_customization_id' parameter!
- No speech detected for 30s
- Unable to transcode data stream application/octet-stream -> audio/l16
- Stream was {number} bytes but needs to be at least 100 bytes.
- keyword {keyword} length exceeds the maximum length 1024
- low_latency is not a supported feature for model {model}
- Character insertion bias must be a value between -1 and 1.
404
Not Found. The specified model does not exist or, for IBM Cloud Pak for Data, the model parameter was not specified but the default model is not installed. The message is Model '{model}' not found.
406
Not Acceptable. The request specified an Accept header with an incompatible content type.
408
Request Timeout. The connection was closed due to inactivity (session timeout) for 30 seconds.
413
Payload Too Large. The request passed an audio file that exceeded the currently supported data limit.
415
Unsupported Media Type. The request specified an unacceptable media type.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 200

{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.15,
          "alternatives": [
            {
              "confidence": 1,
              "word": "a"
            }
          ],
          "end_time": 0.3
        },
        {
          "start_time": 0.3,
          "alternatives": [
            {
              "confidence": 1,
              "word": "line"
            }
          ],
          "end_time": 0.64
        },
        {
          "start_time": 0.64,
          "alternatives": [
            {
              "confidence": 1,
              "word": "of"
            }
          ],
          "end_time": 0.73
        },
        {
          "start_time": 0.73,
          "alternatives": [
            {
              "confidence": 1,
              "word": "severe"
            }
          ],
          "end_time": 1.08
        },
        {
          "start_time": 1.08,
          "alternatives": [
            {
              "confidence": 1,
              "word": "thunderstorms"
            }
          ],
          "end_time": 1.85
        },
        {
          "start_time": 1.85,
          "alternatives": [
            {
              "confidence": 1,
              "word": "with"
            }
          ],
          "end_time": 2
        },
        {
          "start_time": 2,
          "alternatives": [
            {
              "confidence": 1,
              "word": "several"
            }
          ],
          "end_time": 2.52
        },
        {
          "start_time": 2.52,
          "alternatives": [
            {
              "confidence": 1,
              "word": "possible"
            }
          ],
          "end_time": 3.03
        },
        {
          "start_time": 3.03,
          "alternatives": [
            {
              "confidence": 1,
              "word": "tornadoes"
            }
          ],
          "end_time": 3.85
        },
        {
          "start_time": 3.95,
          "alternatives": [
            {
              "confidence": 1,
              "word": "is"
            }
          ],
          "end_time": 4.13
        },
        {
          "start_time": 4.13,
          "alternatives": [
            {
              "confidence": 1,
              "word": "approaching"
            }
          ],
          "end_time": 4.58
        },
        {
          "start_time": 4.58,
          "alternatives": [
            {
              "confidence": 0.96,
              "word": "Colorado"
            }
          ],
          "end_time": 5.16
        },
        {
          "start_time": 5.16,
          "alternatives": [
            {
              "confidence": 0.95,
              "word": "on"
            }
          ],
          "end_time": 5.32
        },
        {
          "start_time": 5.32,
          "alternatives": [
            {
              "confidence": 0.98,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.58,
            "confidence": 0.96,
            "end_time": 5.16
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 1,
            "end_time": 3.85
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 1,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
Copy to clipboard

Success example

{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.15,
          "alternatives": [
            {
              "confidence": 1,
              "word": "a"
            }
          ],
          "end_time": 0.3
        },
        {
          "start_time": 0.3,
          "alternatives": [
            {
              "confidence": 1,
              "word": "line"
            }
          ],
          "end_time": 0.64
        },
        {
          "start_time": 0.64,
          "alternatives": [
            {
              "confidence": 1,
              "word": "of"
            }
          ],
          "end_time": 0.73
        },
        {
          "start_time": 0.73,
          "alternatives": [
            {
              "confidence": 1,
              "word": "severe"
            }
          ],
          "end_time": 1.08
        },
        {
          "start_time": 1.08,
          "alternatives": [
            {
              "confidence": 1,
              "word": "thunderstorms"
            }
          ],
          "end_time": 1.85
        },
        {
          "start_time": 1.85,
          "alternatives": [
            {
              "confidence": 1,
              "word": "with"
            }
          ],
          "end_time": 2
        },
        {
          "start_time": 2,
          "alternatives": [
            {
              "confidence": 1,
              "word": "several"
            }
          ],
          "end_time": 2.52
        },
        {
          "start_time": 2.52,
          "alternatives": [
            {
              "confidence": 1,
              "word": "possible"
            }
          ],
          "end_time": 3.03
        },
        {
          "start_time": 3.03,
          "alternatives": [
            {
              "confidence": 1,
              "word": "tornadoes"
            }
          ],
          "end_time": 3.85
        },
        {
          "start_time": 3.95,
          "alternatives": [
            {
              "confidence": 1,
              "word": "is"
            }
          ],
          "end_time": 4.13
        },
        {
          "start_time": 4.13,
          "alternatives": [
            {
              "confidence": 1,
              "word": "approaching"
            }
          ],
          "end_time": 4.58
        },
        {
          "start_time": 4.58,
          "alternatives": [
            {
              "confidence": 0.96,
              "word": "Colorado"
            }
          ],
          "end_time": 5.16
        },
        {
          "start_time": 5.16,
          "alternatives": [
            {
              "confidence": 0.95,
              "word": "on"
            }
          ],
          "end_time": 5.32
        },
        {
          "start_time": 5.32,
          "alternatives": [
            {
              "confidence": 0.98,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.58,
            "confidence": 0.96,
            "end_time": 5.16
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 1,
            "end_time": 3.85
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 1,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
Copy to clipboard

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

See also: Registering a callback URL.

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

See also: Registering a callback URL.

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

See also: Registering a callback URL.

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

See also: Registering a callback URL.

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or allowlist, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original registration request with response code 201.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not allowlist the URL; it instead sends status code 400 in response to the request to register a callback. If the requested callback URL is already allowlisted, the service responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

After you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

See also: Registering a callback URL.

POST /v1/register_callback

RegisterCallback(string callbackUrl, string userSecret = null)
Copy to clipboard

ServiceCall<RegisterStatus> registerCallback(RegisterCallbackOptions registerCallbackOptions)
Copy to clipboard

registerCallback(params)

register_callback(
        self,
        callback_url: str,
        *,
        user_secret: str = None,
        **kwargs,
    ) -> DetailedResponse
Copy to clipboard

Request

Use the RegisterCallbackOptions.Builder to create a RegisterCallbackOptions object that contains the parameter values for the registerCallback method.

Query Parameters

callback_url
Required*
string
An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret
string
A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

parameters

callbackUrl
Required*
string
An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
userSecret
string
A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

RegisterCallbackOptions

The registerCallback options.

parameters

callbackUrl
Required*
string
An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
userSecret
string
A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

parameters

callback_url
Required*
str
An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret
str
A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Example request for IBM Cloud

curl -X POST -u "apikey:{apikey}" "{url}/v1/register_callback?callback_url=http://{user_callback_path}/job_results&user_secret=ThisIsMySecret"

Example request for IBM Cloud Pak for Data

curl -X POST --header "Authorization: Bearer {token}" "{url}/v1/register_callback?callback_url=http://{user_callback_path}/job_results&user_secret=ThisIsMySecret"

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.RegisterCallback(
    callbackUrl: "http://{user_callback_path}/job_results",
    userSecret: "ThisIsMySecret"
    );

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.RegisterCallback(
    callbackUrl: "http://{user_callback_path}/job_results",
    userSecret: "ThisIsMySecret"
    );

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

RegisterCallbackOptions registerCallbackOptions = new RegisterCallbackOptions.Builder()
  .callbackUrl("http://{user_callback_path}/job_results")
  .userSecret("ThisIsMySecret")
  .build();

RegisterStatus registerStatus =
  speechToText.registerCallback(registerCallbackOptions).execute().getResult();
System.out.println(registerStatus);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

RegisterCallbackOptions registerCallbackOptions = new RegisterCallbackOptions.Builder()
  .callbackUrl("http://{user_callback_path}/job_results")
  .userSecret("ThisIsMySecret")
  .build();

RegisterStatus registerStatus =
  speechToText.registerCallback(registerCallbackOptions).execute().getResult();
System.out.println(registerStatus);
Copy to clipboard

Example request for IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

const registerCallbackParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
  userSecret: 'ThisIsMySecret',
};

speechToText.registerCallback(registerCallbackParams)
  .then(registerStatus => {
    console.log(JSON.stringify(registerStatus, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

const registerCallbackParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
  userSecret: 'ThisIsMySecret',
};

speechToText.registerCallback(registerCallbackParams)
  .then(registerStatus => {
    console.log(JSON.stringify(registerStatus, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

register_status = speech_to_text.register_callback(
    'http://{user_callback_path}/job_results',
    user_secret='ThisIsMySecret'
).get_result()
print(json.dumps(register_status, indent=2))
Copy to clipboard

Example request for IBM Cloud Pak for Data

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

register_status = speech_to_text.register_callback(
    'http://{user_callback_path}/job_results',
    'ThisIsMySecret'
).get_result()
print(json.dumps(register_status, indent=2))
Copy to clipboard

Response

Response Body

RegisterStatus

Information about a request to register a callback for asynchronous speech recognition.

RegisterStatus

Information about a request to register a callback for asynchronous speech recognition.

RegisterStatus

Information about a request to register a callback for asynchronous speech recognition.

RegisterStatus

Information about a request to register a callback for asynchronous speech recognition.

RegisterStatus

Information about a request to register a callback for asynchronous speech recognition.

Status Code

200
OK. The callback was already registered (allowlisted). The status included in the response is already created.
201
Created. The callback was successfully registered (allowlisted). The status included in the response is created.
400
Bad Request. The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service's GET request during the registration process; or the client failed to respond to the server's request before the five-second timeout.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 200

{
  "status": "already created",
  "url": "http://{user_callback_path}/job_results"
}
Copy to clipboard

Success example

{
  "status": "already created",
  "url": "http://{user_callback_path}/job_results"
}
Copy to clipboard

Status 201

{
  "status": "created",
  "url": "http://{user_callback_path}/job_results"
}
Copy to clipboard

Success example

{
  "status": "created",
  "url": "http://{user_callback_path}/job_results"
}
Copy to clipboard

Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

See also: Unregistering a callback URL.

Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

See also: Unregistering a callback URL.

Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

See also: Unregistering a callback URL.

Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

See also: Unregistering a callback URL.

Unregisters a callback URL that was previously allowlisted with a Register a callback request for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

See also: Unregistering a callback URL.

POST /v1/unregister_callback

UnregisterCallback(string callbackUrl)

ServiceCall<Void> unregisterCallback(UnregisterCallbackOptions unregisterCallbackOptions)
Copy to clipboard

unregisterCallback(params)

unregister_callback(
        self,
        callback_url: str,
        **kwargs,
    ) -> DetailedResponse

Request

Use the UnregisterCallbackOptions.Builder to create a UnregisterCallbackOptions object that contains the parameter values for the unregisterCallback method.

Query Parameters

callback_url
Required*
string
The callback URL that is to be unregistered.

parameters

callbackUrl
Required*
string
The callback URL that is to be unregistered.

UnregisterCallbackOptions

The unregisterCallback options.

parameters

callbackUrl
Required*
string
The callback URL that is to be unregistered.

parameters

callback_url
Required*
str
The callback URL that is to be unregistered.

Example request for IBM Cloud

curl -X POST -u "apikey:{apikey}" "{url}/v1/unregister_callback?callback_url=http://{user_callback_path}/job_results"

Example request for IBM Cloud Pak for Data

curl -X POST --header "Authorization: Bearer {token}" "{url}/v1/unregister_callback?callback_url=http://{user_callback_path}/job_results"

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.UnregisterCallback(
    callbackUrl: "http://{user_callback_path}/job_results"
    );

Console.WriteLine(result.StatusCode);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.UnregisterCallback(
    callbackUrl: "http://{user_callback_path}/job_results"
    );

Console.WriteLine(result.StatusCode);
Copy to clipboard

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

UnregisterCallbackOptions unregisterCallbackOptions = new UnregisterCallbackOptions.Builder()
  .callbackUrl("http://{user_callback_path}/job_results")
  .build();

speechToText.unregisterCallback(unregisterCallbackOptions).execute();
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

UnregisterCallbackOptions unregisterCallbackOptions = new UnregisterCallbackOptions.Builder()
  .callbackUrl("http://{user_callback_path}/job_results")
  .build();

speechToText.unregisterCallback(unregisterCallbackOptions).execute().getResult();
Copy to clipboard

Example request for IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

const unregisterCallbackParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
};

speechToText.unregisterCallback(unregisterCallbackParams)
  .then(result => {
    // Response is empty.
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

const unregisterCallbackParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
};

speechToText.unregisterCallback(unregisterCallbackParams)
  .then(result => {
    // Response is empty.
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_to_text.unregister_callback('http://{user_callback_path}/job_results')
Copy to clipboard

Example request for IBM Cloud Pak for Data

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

speech_to_text.unregister_callback('http://{user_callback_path}/job_results')
Copy to clipboard

Response

Response type: object

Status Code

200
OK. The callback URL was successfully unregistered.
400
Bad Request. The request failed because of a user input error (for example, because it failed to pass a callback URL).
404
Not Found. The specified callback URL was not found.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

No Sample Response

This method does not specify any sample responses.

Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:

callback_url
events
user_token
results_ttl

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Creating a job.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Large speech models and Next-generation models

The service supports large speech models and next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Large speech models and next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use large speech models and next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a large speech model or next-generation model by using the model query parameter, as you do a previous-generation model. Only the next-generation models support the low_latency parameter, and all large speech models and next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Large speech models and next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent large speech model or next-generation model by 31 July 2023. For more information, see Migrating to large speech models.

See also:

Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:

callback_url
events
user_token
results_ttl

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Creating a job.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:

callback_url
events
user_token
results_ttl

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Creating a job.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:

callback_url
events
user_token
results_ttl

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Creating a job.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

Creates a job for a new asynchronous recognition request. The job is owned by the instance of the service whose credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token parameters. You must then use the Check jobs or Check a job methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket recognition requests. It also supports the following parameters specific to the asynchronous interface:

callback_url
events
user_token
results_ttl

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl command, use the --data-binary option to upload the file for the request.)

See also: Creating a job.

Streaming mode

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout seconds of streaming audio; use the inactivity_timeout parameter to change the default of 30 seconds.

See also:

Audio formats (content types)

The service accepts audio in the following formats (MIME types).

For formats that are labeled Required, you must use the Content-Type header with the request to specify the format of the audio.
For all other formats, you can omit the Content-Type header or specify application/octet-stream with the header to have the service automatically detect the format of the audio. (With the curl command, you can specify either "Content-Type:" or "Content-Type: application/octet-stream".)

Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.

audio/alaw (Required. Specify the sampling rate (rate) of the audio.)
audio/basic (Required. Use only with narrowband models.)
audio/flac
audio/g729 (Use only with narrowband models.)
audio/l16 (Required. Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
audio/mp3
audio/mpeg
audio/mulaw (Required. Specify the sampling rate (rate) of the audio.)
audio/ogg (The service automatically detects the codec of the input audio.)
audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav (Provide audio with a maximum of nine channels.)
audio/webm (The service automatically detects the codec of the input audio.)
audio/webm;codecs=opus
audio/webm;codecs=vorbis

The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.

See also: Supported audio formats.

Next-generation models

The service supports next-generation Multimedia (16 kHz) and Telephony (8 kHz) models for many languages. Next-generation models have higher throughput than the service's previous generation of Broadband and Narrowband models. When you use next-generation models, the service can return transcriptions more quickly and also provide noticeably better transcription accuracy.

You specify a next-generation model by using the model query parameter, as you do a previous-generation model. Most next-generation models support the low_latency parameter, and all next-generation models support the character_insertion_bias parameter. These parameters are not available with previous-generation models.

Next-generation models do not support all of the speech recognition parameters that are available for use with previous-generation models. Next-generation models do not support the following parameters:

acoustic_customization_id
keywords and keywords_threshold
processing_metrics and processing_metrics_interval
word_alternatives_threshold

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more information, see Migrating to next-generation models.

See also:

POST /v1/recognitions

CreateJob(System.IO.MemoryStream audio, string contentType = null, string model = null, string callbackUrl = null, string events = null, string userToken = null, long? resultsTtl = null, string languageCustomizationId = null, string acousticCustomizationId = null, string baseModelVersion = null, double? customizationWeight = null, long? inactivityTimeout = null, List<string> keywords = null, float? keywordsThreshold = null, long? maxAlternatives = null, float? wordAlternativesThreshold = null, bool? wordConfidence = null, bool? timestamps = null, bool? profanityFilter = null, bool? smartFormatting = null, bool? speakerLabels = null, string grammarName = null, bool? redaction = null, bool? processingMetrics = null, float? processingMetricsInterval = null, bool? audioMetrics = null, double? endOfPhraseSilenceTime = null, bool? splitTranscriptAtPhraseEnd = null, float? speechDetectorSensitivity = null, float? backgroundAudioSuppression = null, bool? lowLatency = null, float? characterInsertionBias = null)
Copy to clipboard

ServiceCall<RecognitionJob> createJob(CreateJobOptions createJobOptions)
Copy to clipboard

createJob(params)

create_job(
        self,
        audio: BinaryIO,
        *,
        content_type: str = None,
        model: str = None,
        callback_url: str = None,
        events: str = None,
        user_token: str = None,
        results_ttl: int = None,
        language_customization_id: str = None,
        acoustic_customization_id: str = None,
        base_model_version: str = None,
        customization_weight: float = None,
        inactivity_timeout: int = None,
        keywords: List[str] = None,
        keywords_threshold: float = None,
        max_alternatives: int = None,
        word_alternatives_threshold: float = None,
        word_confidence: bool = None,
        timestamps: bool = None,
        profanity_filter: bool = None,
        smart_formatting: bool = None,
        speaker_labels: bool = None,
        grammar_name: str = None,
        redaction: bool = None,
        processing_metrics: bool = None,
        processing_metrics_interval: float = None,
        audio_metrics: bool = None,
        end_of_phrase_silence_time: float = None,
        split_transcript_at_phrase_end: bool = None,
        speech_detector_sensitivity: float = None,
        background_audio_suppression: float = None,
        low_latency: bool = None,
        character_insertion_bias: float = None,
        **kwargs,
    ) -> DetailedResponse
Copy to clipboard

Request

Use the CreateJobOptions.Builder to create a CreateJobOptions object that contains the parameter values for the createJob method.

Custom Headers

Transfer-Encoding
string
Set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service. See Audio transmission.

Allowable values: [chunked]
Content-Type
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]

Query Parameters

model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN,en-IN_Telephony,en-GB,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
callback_url
string
A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the Register a callback method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results.

Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events
string
If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
- recognitions.started generates a callback notification when the service begins to process the job.
- recognitions.completed generates a callback notification when the job is complete. You must use the Check a job method to retrieve the results before they time out or are deleted.
- recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
- recognitions.failed generates a callback notification if the service experiences an error while processing the job.
The recognitions.completed and recognitions.completed_with_results events are incompatible. You can specify only of the two events.

If the job includes a callback URL, omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. If the job does not include a callback URL, omit the parameter.
Allowable values: [recognitions.started,recognitions.completed,recognitions.completed_with_results,recognitions.failed]
user_token
string
If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
results_ttl
int32
The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
language_customization_id
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acoustic_customization_id
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
base_model_version
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customization_weight
double
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.5 for large speech models
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivity_timeout
int32
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
string[]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywords_threshold
float
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
max_alternatives
int32
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
word_alternatives_threshold
float
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
word_confidence
boolean
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
boolean
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanity_filter
boolean
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smart_formatting
boolean
Beta
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
smart_formatting_version
integer
Smart formatting version for large speech models and next-generation models is supported in US English, Brazilian Portuguese, French, German, Spanish and French Canadian languages.

Default: 0
speaker_labels
boolean
Beta
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For large speech models and next-generation models, the parameter can be used with all available languages.
See Speaker labels.
Default: false
grammar_name
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
boolean
Beta
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
processing_metrics
boolean
If true, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval specified by the processing_metrics_interval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

See Processing metrics.

Default: false
processing_metrics_interval
float
Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics parameter is set to true.

The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

See Processing metrics.

Default: 1
audio_metrics
boolean
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
end_of_phrase_silence_time
double
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
split_transcript_at_phrase_end
boolean
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speech_detector_sensitivity
float
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
background_audio_suppression
float
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all large speech models, next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0
low_latency
boolean
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for large speech models and previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
character_insertion_bias
float
Beta
For large speech models and next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0

Request Body

Required*

application/octet-streambinary

The audio to transcribe.

parameters

audio
Required*
System.IO.MemoryStream
The audio to transcribe.
contentType
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
callbackUrl
string
A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the Register a callback method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results.

Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events
string
If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
- recognitions.started generates a callback notification when the service begins to process the job.
- recognitions.completed generates a callback notification when the job is complete. You must use the Check a job method to retrieve the results before they time out or are deleted.
- recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
- recognitions.failed generates a callback notification if the service experiences an error while processing the job.
The recognitions.completed and recognitions.completed_with_results events are incompatible. You can specify only of the two events.

If the job includes a callback URL, omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. If the job does not include a callback URL, omit the parameter.
Allowable values: [recognitions.started,recognitions.completed,recognitions.completed_with_results,recognitions.failed]
userToken
string
If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
resultsTtl
long?
The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
languageCustomizationId
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acousticCustomizationId
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
baseModelVersion
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customizationWeight
double?
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivityTimeout
long?
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
List<string>
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywordsThreshold
float?
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
maxAlternatives
long?
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
wordAlternativesThreshold
float?
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
wordConfidence
bool?
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
bool?
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanityFilter
bool?
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smartFormatting
bool?
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speakerLabels
bool?
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammarName
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
bool?
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
processingMetrics
bool?
If true, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval specified by the processing_metrics_interval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

See Processing metrics.

Default: false
processingMetricsInterval
float?
Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics parameter is set to true.

The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

See Processing metrics.

Default: 1.0
audioMetrics
bool?
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
endOfPhraseSilenceTime
double?
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
splitTranscriptAtPhraseEnd
bool?
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speechDetectorSensitivity
float?
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
backgroundAudioSuppression
float?
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
lowLatency
bool?
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
characterInsertionBias
float?
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

CreateJobOptions

The createJob options.

parameters

audio
Required*
NodeJS.ReadableStream
The audio to transcribe.
contentType
string
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
string
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
callbackUrl
string
A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the Register a callback method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results.

Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events
string
If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
- recognitions.started generates a callback notification when the service begins to process the job.
- recognitions.completed generates a callback notification when the job is complete. You must use the Check a job method to retrieve the results before they time out or are deleted.
- recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
- recognitions.failed generates a callback notification if the service experiences an error while processing the job.
The recognitions.completed and recognitions.completed_with_results events are incompatible. You can specify only of the two events.

If the job includes a callback URL, omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. If the job does not include a callback URL, omit the parameter.
Allowable values: [recognitions.started,recognitions.completed,recognitions.completed_with_results,recognitions.failed]
userToken
string
If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
resultsTtl
number
The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
languageCustomizationId
string
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acousticCustomizationId
string
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
baseModelVersion
string
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customizationWeight
number
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivityTimeout
number
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
string[]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywordsThreshold
number
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
maxAlternatives
number
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
wordAlternativesThreshold
number
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
wordConfidence
boolean
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
boolean
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanityFilter
boolean
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smartFormatting
boolean
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speakerLabels
boolean
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammarName
string
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
boolean
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
processingMetrics
boolean
If true, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval specified by the processing_metrics_interval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

See Processing metrics.

Default: false
processingMetricsInterval
number
Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics parameter is set to true.

The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

See Processing metrics.

Default: 1.0
audioMetrics
boolean
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
endOfPhraseSilenceTime
number
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
splitTranscriptAtPhraseEnd
boolean
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speechDetectorSensitivity
number
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
backgroundAudioSuppression
number
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
lowLatency
boolean
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
characterInsertionBias
number
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

parameters

audio
Required*
BinaryIO
The audio to transcribe.
content_type
str
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

Allowable values: [application/octet-stream,audio/alaw,audio/basic,audio/flac,audio/g729,audio/l16,audio/mp3,audio/mpeg,audio/mulaw,audio/ogg,audio/ogg;codecs=opus,audio/ogg;codecs=vorbis,audio/wav,audio/webm,audio/webm;codecs=opus,audio/webm;codecs=vorbis]
model
str
The model to use for speech recognition. If you omit the model parameter, the service uses the US English en-US_BroadbandModel by default.

For IBM Cloud Pak for Data, if you do not install the en-US_BroadbandModel, you must either specify a model with the request or specify a new default model for your installation of the service.

See also:
- Using a model for speech recognition
- Using the default model.
Allowable values: [ar-MS_BroadbandModel,ar-MS_Telephony,cs-CZ_Telephony,de-DE_BroadbandModel,de-DE_Multimedia,de-DE_NarrowbandModel,de-DE_Telephony,en-AU_BroadbandModel,en-AU_Multimedia,en-AU_NarrowbandModel,en-AU_Telephony,en-IN_Telephony,en-GB_BroadbandModel,en-GB_Multimedia,en-GB_NarrowbandModel,en-GB_Telephony,en-US_BroadbandModel,en-US_Multimedia,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,en-US_Telephony,en-WW_Medical_Telephony,es-AR_BroadbandModel,es-AR_NarrowbandModel,es-CL_BroadbandModel,es-CL_NarrowbandModel,es-CO_BroadbandModel,es-CO_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,es-ES_Multimedia,es-ES_Telephony,es-LA_Telephony,es-MX_BroadbandModel,es-MX_NarrowbandModel,es-PE_BroadbandModel,es-PE_NarrowbandModel,fr-CA_BroadbandModel,fr-CA_Multimedia,fr-CA_NarrowbandModel,fr-CA_Telephony,fr-FR_BroadbandModel,fr-FR_Multimedia,fr-FR_NarrowbandModel,fr-FR_Telephony,hi-IN_Telephony,it-IT_BroadbandModel,it-IT_NarrowbandModel,it-IT_Multimedia,it-IT_Telephony,ja-JP_BroadbandModel,ja-JP_Multimedia,ja-JP_NarrowbandModel,ja-JP_Telephony,ko-KR_BroadbandModel,ko-KR_Multimedia,ko-KR_NarrowbandModel,ko-KR_Telephony,nl-BE_Telephony,nl-NL_BroadbandModel,nl-NL_Multimedia,nl-NL_NarrowbandModel,nl-NL_Telephony,pt-BR_BroadbandModel,pt-BR_Multimedia,pt-BR_NarrowbandModel,pt-BR_Telephony,sv-SE_Telephony,zh-CN_BroadbandModel,zh-CN_NarrowbandModel,zh-CN_Telephony]
Default: en-US_BroadbandModel
callback_url
str
A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the Register a callback method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results.

Use the user_token parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events
str
If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
- recognitions.started generates a callback notification when the service begins to process the job.
- recognitions.completed generates a callback notification when the job is complete. You must use the Check a job method to retrieve the results before they time out or are deleted.
- recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
- recognitions.failed generates a callback notification if the service experiences an error while processing the job.
The recognitions.completed and recognitions.completed_with_results events are incompatible. You can specify only of the two events.

If the job includes a callback URL, omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. If the job does not include a callback URL, omit the parameter.
Allowable values: [recognitions.started,recognitions.completed,recognitions.completed_with_results,recognitions.failed]
user_token
str
If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
results_ttl
int
The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
language_customization_id
str
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Using a custom language model for speech recognition.

Note: Use this parameter instead of the deprecated customization_id parameter.
acoustic_customization_id
str
The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Using a custom acoustic model for speech recognition.
base_model_version
str
The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Making speech recognition requests with upgraded custom models.
customization_weight
float
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when the model was trained, the default value is:
- 0.3 for previous-generation models
- 0.2 for most next-generation models
- 0.1 for next-generation English and Japanese models
A customization weight that you specify overrides a weight that was specified when the custom model was trained. The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

See Using customization weight.
inactivity_timeout
int
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Inactivity timeout.

Default: 30
keywords
List[str]
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

See Keyword spotting.
keywords_threshold
float
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
max_alternatives
int
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of 0, the service uses the default value, 1. See Maximum alternatives.

Default: 1
word_alternatives_threshold
float
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
word_confidence
bool
If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.

Default: false
timestamps
bool
If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

Default: false
profanity_filter
bool
If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring.

Note: The parameter can be used with US English and Japanese transcription only. See Profanity filtering.

Default: true
smart_formatting
bool
If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

See Smart formatting.

Default: false
speaker_labels
bool
If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.
- For previous-generation models, the parameter can be used with Australian English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
- For next-generation models, the parameter can be used with Czech, English (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.
See Speaker labels.
Default: false
grammar_name
str
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the language_customization_id parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource.

See Using a grammar for speech recognition.
redaction
bool
If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_threshold parameters) and returns only a single final transcript (forces the max_alternatives parameter to be 1).

Note: The parameter can be used with US English, Japanese, and Korean transcription only.

See Numeric redaction.

Default: false
processing_metrics
bool
If true, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval specified by the processing_metrics_interval parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

See Processing metrics.

Default: false
processing_metrics_interval
float
Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the processing_metrics parameter is set to true.

The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

See Processing metrics.

Default: 1.0
audio_metrics
bool
If true, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

See Audio metrics.

Default: false
end_of_phrase_silence_time
float
Specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

See End of phrase silence time.
Default: 0.8
split_transcript_at_phrase_end
bool
If true, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, end_of_phrase_silence_time has precedence over split_transcript_at_phrase_end.

See Split transcript at phrase end.

Default: false
speech_detector_sensitivity
float
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Speech detector sensitivity and Language model support.
Default: 0.5
background_audio_suppression
float
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. Specifying one or two decimal places of precision (for example, 0.55) is typically more than sufficient.

The parameter is supported with all next-generation models and with most previous-generation models. See Background audio suppression and Language model support.
Default: 0.0
low_latency
bool
If true for next-generation Multimedia and Telephony models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The low_latency parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

The parameter is not available for previous-generation Broadband and Narrowband models. It is available for most next-generation models.
- For a list of next-generation models that support low latency, see Supported next-generation language models.
- For more information about the low_latency parameter, see Low latency.
Default: false
character_insertion_bias
float
For next-generation models, an indication of whether the service is biased to recognize shorter or longer strings of characters when developing transcription hypotheses. By default, the service is optimized to produce the best balance of strings of different lengths.

The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
- Negative values bias the service to favor hypotheses with shorter strings of characters.
- Positive values bias the service to favor hypotheses with longer strings of characters.
As the value approaches -1.0 or 1.0, the impact of the parameter becomes more pronounced. To determine the most effective value for your scenario, start by setting the value of the parameter to a small increment, such as -0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the transcription results. Then experiment with different values as necessary, adjusting the value by small increments.

The parameter is not available for previous-generation models.

See Character insertion bias.
Default: 0.0

Example request for IBM Cloud

curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary @audio-file.flac "{url}/v1/recognitions?callback_url=http://{user_callback_path}/job_results&user_token=job25&timestamps=true"
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud Pak for Data

curl -X POST --header "Authorization: Bearer {token}" --header "Content-Type: audio/flac" --data-binary @audio-file.flac "{url}/v1/recognitions?callback_url=http://{user_callback_path}/job_results&user_token=job25&timestamps=true"
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.CreateJob(
    callbackUrl: "http://{user_callback_path}/job_results",
    userToken: "job25",
    audio: new MemoryStream(File.ReadAllBytes("audio-file.flac")),
    contentType: "audio/flac",
    timestamps: true
    );

Console.WriteLine(result.Response);
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.CreateJob(
    callbackUrl: "http://{user_callback_path}/job_results",
    userToken: "job25",
    audio: new MemoryStream(File.ReadAllBytes("audio-file.flac")),
    contentType: "audio/flac",
    timestamps: true
    );

Console.WriteLine(result.Response);
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

try {
  CreateJobOptions createJobOptions = new CreateJobOptions.Builder()
    .callbackUrl("http://{user_callback_path}/job_results")
    .userToken("job25")
    .audio(new File("audio-file.flac"))
    .contentType("audio/flac")
    .timestamps(true)
    .build();

  RecognitionJob recognitionJob =
    speechToText.createJob(createJobOptions).execute().getResult();
  System.out.println(recognitionJob);
} catch (FileNotFoundException e) {
  e.printStackTrace();
}
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

try {
  CreateJobOptions createJobOptions = new CreateJobOptions.Builder()
    .callbackUrl("http://{user_callback_path}/job_results")
    .userToken("job25")
    .audio(new File("audio-file.flac"))
    .contentType("audio/flac")
    .timestamps(true)
    .build();

  RecognitionJob recognitionJob =
    speechToText.createJob(createJobOptions).execute().getResult();
  System.out.println(recognitionJob);
} catch (FileNotFoundException e) {
  e.printStackTrace();
}
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud

const fs = require('fs');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

const createJobParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
  userToken: 'job25',
  audio: fs.createReadStream('./audio-file.flac'),
  contentType: 'audio/flac',
  timestamps: true,
};

speechToText.createJob(createJobParams)
  .then(recognitionJob => {
    console.log(JSON.stringify(recognitionJob, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud Pak for Data

const fs = require('fs');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

const createJobParams = {
  callbackUrl: 'http://{user_callback_path}/job_results',
  userToken: 'job25',
  audio: fs.createReadStream('audio-file.flac'),
  contentType: 'audio/flac',
  timestamps: true,
};

speechToText.createJob(createJobParams)
  .then(recognitionJob => {
    console.log(JSON.stringify(recognitionJob, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud

from os.path import join, dirname
import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

with open(join(dirname(__file__), './.', 'audio-file.flac'),
               'rb') as audio_file:
    recognition_job = speech_to_text.create_job(
        audio_file,
        content_type='audio/flac',
        callback_url='http://{user_callback_path}/job_results',
        user_token='job25',
        timestamps=True
    ).get_result()
print(json.dumps(recognition_job, indent=2))
Copy to clipboard

Download sample file audio-file.flac

Example request for IBM Cloud Pak for Data

from os.path import join, dirname
import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

with open(join(dirname(__file__), './.', 'audio-file.flac'),
               'rb') as audio_file:
    recognition_job = speech_to_text.create_job(
        audio_file,
        content_type='audio/flac',
        callback_url='http://{user_callback_path}/job_results',
        user_token='job25',
        timestamps=True
    ).get_result()
print(json.dumps(recognition_job, indent=2))
Copy to clipboard

Download sample file audio-file.flac

Response

Response Body

RecognitionJob

Information about a current asynchronous speech recognition job.

RecognitionJob

Information about a current asynchronous speech recognition job.

RecognitionJob

Information about a current asynchronous speech recognition job.

RecognitionJob

Information about a current asynchronous speech recognition job.

RecognitionJob

Information about a current asynchronous speech recognition job.

Status Code

201
Created. The job was successfully created.
400
Bad Request. The request failed because of a user input error. For example, the request passed audio that does not match the indicated format or failed to specify a required audio format; specified a custom language or custom acoustic model that is not in the available state; or specified both the recognitions.completed and recognitions.completed_with_results events. Specific messages include
- Model {model} not found
- Requested model is not available
- This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
- speaker_labels is not a supported feature for model {model}
- keywords_threshold value must be between zero and one (inclusive)
- word_alternatives_threshold value must be between zero and one (inclusive)
- You cannot specify both 'customization_id' and 'language_customization_id' parameter!
- No speech detected for 30s
- Unable to transcode data stream application/octet-stream -> audio/l16
- Stream was {number} bytes but needs to be at least 100 bytes.
- keyword {keyword} length exceeds the maximum length 1024
- low_latency is not a supported feature for model {model}
- Character insertion bias must be a value between -1 and 1.
404
Not Found. The specified model does not exist or, for IBM Cloud Pak for Data, the model parameter was not specified but the default model is not installed. The message is Model '{model}' not found.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 201

{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "status": "waiting",
  "created": "2016-08-17T19:15:17.926Z",
  "url": "{url}/v1/recognitions/4bd734c0-e575-21f3-de03-f932aa0468a0"
}
Copy to clipboard

Success example

{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "status": "waiting",
  "created": "2016-08-17T19:15:17.926Z",
  "url": "{url}/v1/recognitions/4bd734c0-e575-21f3-de03-f932aa0468a0"
}
Copy to clipboard

Returns the ID and status of the latest 100 outstanding jobs associated with the credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is completed or not one of the latest 100 outstanding jobs, use the [Check a job[(#checkjob) method. A job and its results remain available until you delete them with the Delete a job method or until the job's time to live expires, whichever comes first.

GET /v1/recognitions

CheckJobs()

ServiceCall<RecognitionJobs> checkJobs()
Copy to clipboard

checkJobs(params)

check_jobs(
        self,
        **kwargs,
    ) -> DetailedResponse

Request

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

No Request Parameters

This method does not accept any request parameters.

Example request for IBM Cloud

curl -X GET -u "apikey:{apikey}" "{url}/v1/recognitions"

Example request for IBM Cloud Pak for Data

curl -X GET --header "Authorization: Bearer {token}" "{url}/v1/recognitions"

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator(
    apikey: "{apikey}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.CheckJobs();

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator(
    url: "https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize",
    username: "{username}",
    password: "{password}"
    );

SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");

var result = speechToText.CheckJobs();

Console.WriteLine(result.Response);
Copy to clipboard

Example request for IBM Cloud

IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

RecognitionJobs recognitionJobs = speechToText.checkJobs().execute().getResult();
System.out.println(recognitionJobs);
Copy to clipboard

Example request for IBM Cloud Pak for Data

CloudPakForDataAuthenticator authenticator = new CloudPakForDataAuthenticator("https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize", "{username}", "{password}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");

RecognitionJobs recognitionJobs = speechToText.checkJobs().execute().getResult();
System.out.println(recognitionJobs);
Copy to clipboard

Example request for IBM Cloud

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new IamAuthenticator({
    apikey: '{apikey}',
  }),
  serviceUrl: '{url}',
});

speechToText.checkJobs()
  .then(recognitionJobs => {
    console.log(JSON.stringify(recognitionJobs, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud Pak for Data

const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { CloudPakForDataAuthenticator } = require('ibm-watson/auth');

const speechToText = new SpeechToTextV1({
  authenticator: new CloudPakForDataAuthenticator({
    username: '{username}',
    password: '{password}',
    url: 'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize',
  }),
  serviceUrl: '{url}',
});

speechToText.checkJobs()
  .then(recognitionJobs => {
    console.log(JSON.stringify(recognitionJobs, null, 2));
  })
  .catch(err => {
    console.log('error:', err);
  });
Copy to clipboard

Example request for IBM Cloud

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

recognition_jobs = speech_to_text.check_jobs().get_result()
print(json.dumps(recognition_jobs, indent=2))
Copy to clipboard

Example request for IBM Cloud Pak for Data

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

authenticator = CloudPakForDataAuthenticator(
    '{username}',
    '{password}',
    'https://{cpd_cluster_host}{:port}/icp4d-api/v1/authorize'
)

speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)

speech_to_text.set_service_url('{url}')

recognition_jobs = speech_to_text.check_jobs().get_result()
print(json.dumps(recognition_jobs, indent=2))
Copy to clipboard

Response

Response Body

RecognitionJobs

Information about current asynchronous speech recognition jobs.

RecognitionJobs

Information about current asynchronous speech recognition jobs.

RecognitionJobs

Information about current asynchronous speech recognition jobs.

RecognitionJobs

Information about current asynchronous speech recognition jobs.

RecognitionJobs

Information about current asynchronous speech recognition jobs.

Status Code

200
OK. The request succeeded.
500
Internal Server Error. The service experienced an internal error.
503
Service Unavailable. The service is currently unavailable.

Example responses

Status 200

{
  "recognitions": [
    {
      "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
      "created": "2016-08-17T19:15:17.926Z",
      "updated": "2016-08-17T19:15:17.926Z",
      "status": "waiting",
      "user_token": "job25"
    },
    {
      "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
      "created": "2016-08-17T19:13:23.622Z",
      "updated": "2016-08-17T19:13:24.434Z",
      "status": "processing"
    },
    {
      "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
      "created": "2016-08-17T19:11:04.298Z",
      "updated": "2016-08-17T19:11:16.003Z",
      "status": "completed"
    }
  ]
}
Copy to clipboard

Success example

{
  "recognitions": [
    {
      "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
      "created": "2016-08-17T19:15:17.926Z",
      "updated": "2016-08-17T19:15:17.926Z",
      "status": "waiting",
      "user_token": "job25"
    },
    {
      "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
      "created": "2016-08-17T19:13:23.622Z",
      "updated": "2016-08-17T19:13:24.434Z",
      "status": "processing"
    },
    {
      "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
      "created": "2016-08-17T19:11:04.298Z",
      "updated": "2016-08-17T19:11:16.003Z",
      "status": "completed"
    }
  ]
}
Copy to clipboard

Returns information about the specified job. The response always includes the status of the job and its creation and update times. If the status is completed, the response includes the results of the recognition request. You must use credentials for the instance of the service that owns a job to list information about it.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available. Use the Check jobs method to request information about the most recent jobs associated with the calling credentials.