Introduction
The IBM Watson™ Speech to Text service provides APIs that use IBM's speech-recognition capabilities to produce transcripts of spoken audio. The service can transcribe speech from various languages and audio formats. In addition to basic transcription, the service can produce detailed information about many different aspects of the audio. For most languages, the service supports two sampling rates, broadband and narrowband. It returns all JSON response content in the UTF-8 character set.
For speech recognition, the service supports synchronous and asynchronous HTTP Representational State Transfer (REST) interfaces. It also supports a WebSocket interface that provides a full-duplex, low-latency communication channel: Clients send requests and audio to the service and receive results over a single connection asynchronously.
The service also offers two customization interfaces. Use language model customization to expand the vocabulary of a base model with domain-specific terminology. Use acoustic model customization to adapt a base model for the acoustic characteristics of your audio. For language model customization, the service also supports grammars. A grammar is a formal language specification that lets you restrict the phrases that the service can recognize.
Language model customization and acoustic model customization are generally available for production use with all language models that are generally available. Grammars are beta functionality for all language models that support language model customization.
This documentation describes Java SDK major version 9. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Node SDK major version 6. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Python SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Ruby SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes .NET Standard SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Go SDK major version 2. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Swift SDK major version 4. For more information about how to update your code from the previous version, see the migration guide.
This documentation describes Unity SDK major version 5. For more information about how to update your code from the previous version, see the migration guide.
The IBM Watson Unity SDK has the following requirements.
- The SDK requires Unity version 2018.2 or later to support Transport Layer Security (TLS) 1.2.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
.NET 4.x Equivalent
. - For more information, see TLS 1.0 support.
- Set the project settings for both the Scripting Runtime Version and the Api Compatibility Level to
- The SDK doesn't support the WebGL projects. Change your build settings to any platform except
WebGL
.
For more information about how to install and configure the SDK and SDK Core, see https://github.com/watson-developer-cloud/unity-sdk.
The code examples on this tab use the client library that is provided for Java.
Maven
<dependency>
<groupId>com.ibm.watson</groupId>
<artifactId>ibm-watson</artifactId>
<version>9.0.1</version>
</dependency>
Gradle
compile 'com.ibm.watson:ibm-watson:9.0.1'
GitHub
The code examples on this tab use the client library that is provided for Node.js.
Installation
npm install ibm-watson@^6.0.0
GitHub
The code examples on this tab use the client library that is provided for Python.
Installation
pip install --upgrade "ibm-watson>=5.0.0"
GitHub
The code examples on this tab use the client library that is provided for Ruby.
Installation
gem install ibm_watson
GitHub
The code examples on this tab use the client library that is provided for Go.
go get -u github.com/watson-developer-cloud/go-sdk@v2.0.0
GitHub
The code examples on this tab use the client library that is provided for Swift.
Cocoapods
pod 'IBMWatsonSpeechToTextV1', '~> 4.0.0'
Carthage
github "watson-developer-cloud/swift-sdk" ~> 4.0.0
Swift Package Manager
.package(url: "https://github.com/watson-developer-cloud/swift-sdk", from: "4.0.0")
GitHub
The code examples on this tab use the client library that is provided for .NET Standard.
Package Manager
Install-Package IBM.Watson.SpeechToText.v1 -Version 5.0.0
.NET CLI
dotnet add package IBM.Watson.SpeechToText.v1 --version 5.0.0
PackageReference
<PackageReference Include="IBM.Watson.SpeechToText.v1" Version="5.0.0" />
GitHub
The code examples on this tab use the client library that is provided for Unity.
GitHub
Authentication
You authenticate to the API by using IBM Cloud Identity and Access Management (IAM).
You can pass either a bearer token in an authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication. For more information, see Authenticating to Watson services.
- For testing and development, you can pass an API key directly.
- For production use, unless you use the Watson SDKs, use an IAM token.
If you pass in an API key, use apikey
for the username and the value of the API key as the password. For example, if the API key is f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI
in the service credentials, include the credentials in your call like this:
curl -u "apikey:f5sAznhrKQyvBFFaZbtF60m5tzLbqWhyALQawBg5TjRI"
For IBM Cloud instances, the SDK provides initialization methods for each form of authentication.
- Use the API key to have the SDK manage the lifecycle of the access token. The SDK requests an access token, ensures that the access token is valid, and refreshes it if necessary.
- Use the access token to manage the lifecycle yourself. You must periodically refresh the token.
For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.For more information, see IAM authentication with the SDK.
Replace {apikey}
and {url}
with your service credentials.
curl -X {request_method} -u "apikey:{apikey}" "{url}/v1/{method}"
SDK managing the IAM token. Replace {apikey}
and {url}
.
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");
SDK managing the IAM token. Replace {apikey}
and {url}
.
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const speechToText = new SpeechToTextV1({
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
});
SDK managing the IAM token. Replace {apikey}
and {url}
.
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
authenticator=authenticator
)
speech_to_text.set_service_url('{url}')
SDK managing the IAM token. Replace {apikey}
and {url}
.
require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
authenticator: authenticator
)
speech_to_text.service_url = "{url}"
SDK managing the IAM token. Replace {apikey}
and {url}
.
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/speechtotextv1"
)
func main() {
authenticator := &core.IamAuthenticator{
ApiKey: "{apikey}",
}
options := &speechtotextv1.SpeechToTextV1Options{
Authenticator: authenticator,
}
speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)
if speechToTextErr != nil {
panic(speechToTextErr)
}
speechToText.SetServiceURL("{url}")
}
SDK managing the IAM token. Replace {apikey}
and {url}
.
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"
SDK managing the IAM token. Replace {apikey}
and {url}
.
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
SDK managing the IAM token. Replace {apikey}
and {url}
.
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
Access between services
Your application might use more than one Watson service. You can grant access between services and you can grant access to more than one service for your applications.
For IBM Cloud services, the method to grant access between Watson services varies depending on the type of API key. For more information, see IAM access.
- To grant access between IBM Cloud services, create an authorization between the services. For more information, see Granting access between services.
- To grant access to your services by applications without using user credentials, create a service ID, add an API key, and assign access policies. For more information, see Creating and working with service IDs.
Make sure that you use an endpoint URL that includes the service instance ID (for example, https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
). You can find the instance ID in two places:
- By clicking the service instance row in the Resource list. The instance ID is the GUID in the details pane.
By clicking the name of the service instance in the list and looking at the credentials URL.
If you don't see the instance ID in the URL, you can add new credentials from the Service credentials page.
IBM Cloud URLs
The base URLs come from the service instance. To find the URL, view the service credentials by clicking the name of the service in the Resource list. Use the value of the URL. Add the method to form the complete API endpoint for your request.
The following example URL represents a Speech to Text instance that is hosted in Washington, DC:
https://api.us-east.speech-to-text.watson.cloud.ibm.com/instances/6bbda3b3-d572-45e1-8c54-22d6ed9e52c2
The following URLs represent the base URLs for Speech to Text. When you call the API, use the URL that corresponds to the location of your service instance.
- Dallas:
https://api.us-south.speech-to-text.watson.cloud.ibm.com
- Washington, DC:
https://api.us-east.speech-to-text.watson.cloud.ibm.com
- Frankfurt:
https://api.eu-de.speech-to-text.watson.cloud.ibm.com
- Sydney:
https://api.au-syd.speech-to-text.watson.cloud.ibm.com
- Tokyo:
https://api.jp-tok.speech-to-text.watson.cloud.ibm.com
- London:
https://api.eu-gb.speech-to-text.watson.cloud.ibm.com
- Seoul:
https://api.kr-seo.speech-to-text.watson.cloud.ibm.com
Set the correct service URL by calling the setServiceUrl()
method of the service instance.
Set the correct service URL by specifying the serviceUrl
parameter when you create the service instance.
Set the correct service URL by calling the set_service_url()
method of the service instance.
Set the correct service URL by specifying the service_url
property of the service instance.
Set the correct service URL by calling the SetServiceURL()
method of the service instance.
Set the correct service URL by setting the serviceURL
property of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Set the correct service URL by calling the SetServiceUrl()
method of the service instance.
Dallas API endpoint example for services managed on IBM Cloud
curl -X {request_method} -u "apikey:{apikey}" "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}"
Your service instance might not use this URL
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const speechToText = new SpeechToTextV1({
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: 'https://api.us-east.speech-to-text.watson.cloud.ibm.com',
});
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
authenticator=authenticator
)
speech_to_text.set_service_url('https://api.us-east.speech-to-text.watson.cloud.ibm.com')
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
authenticator: authenticator
)
speech_to_text.service_url = "https://api.us-east.speech-to-text.watson.cloud.ibm.com"
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)
if speechToTextErr != nil {
panic(speechToTextErr)
}
speechToText.SetServiceURL("https://api.us-east.speech-to-text.watson.cloud.ibm.com")
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "https://api.us-east.speech-to-text.watson.cloud.ibm.com"
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Default URL
https://api.us-south.speech-to-text.watson.cloud.ibm.com
Example for the Washington, DC location
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("https://api.us-east.speech-to-text.watson.cloud.ibm.com");
Disabling SSL verification
All Watson services use Secure Sockets Layer (SSL) (or Transport Layer Security (TLS)) for secure connections between the client and server. The connection is verified against the local certificate store to ensure authentication, integrity, and confidentiality.
If you use a self-signed certificate, you need to disable SSL verification to make a successful connection.
Enabling SSL verification is highly recommended. Disabling SSL jeopardizes the security of the connection and data. Disable SSL only if necessary, and take steps to enable SSL as soon as possible.
To disable SSL verification for a curl request, use the --insecure
(-k
) option with the request.
To disable SSL verification, create an HttpConfigOptions
object and set the disableSslVerification
property to true
. Then, pass the object to the service instance by using the configureClient
method.
To disable SSL verification, set the disableSslVerification
parameter to true
when you create the service instance.
To disable SSL verification, specify True
on the set_disable_ssl_verification
method for the service instance.
To disable SSL verification, set the disable_ssl_verification
parameter to true
in the configure_http_client()
method for the service instance.
To disable SSL verification, call the DisableSSLVerification
method on the service instance.
To disable SSL verification, call the disableSSLVerification()
method on the service instance. You cannot disable SSL verification on Linux.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
To disable SSL verification, set the DisableSslVerification
method to true
on the service instance.
Example to disable SSL verification. Replace {apikey}
and {url}
with your service credentials.
curl -k -X {request_method} -u "apikey:{apikey}" "{url}/{method}"
Example to disable SSL verification
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");
HttpConfigOptions configOptions = new HttpConfigOptions.Builder()
.disableSslVerification(true)
.build();
speechToText.configureClient(configOptions);
Example to disable SSL verification
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const speechToText = new SpeechToTextV1({
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
disableSslVerification: true,
});
Example to disable SSL verification
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
authenticator=authenticator
)
speech_to_text.set_service_url('{url}')
speech_to_text.set_disable_ssl_verification(True)
Example to disable SSL verification
require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
authenticator: authenticator
)
speech_to_text.service_url = "{url}"
speech_to_text.configure_http_client(disable_ssl_verification: true)
Example to disable SSL verification
speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)
if speechToTextErr != nil {
panic(speechToTextErr)
}
speechToText.SetServiceURL("{url}")
speechToText.DisableSSLVerification()
Example to disable SSL verification
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}")
let speechToText = SpeechToText(authenticator: authenticator)
speechToText.serviceURL = "{url}"
speechToText.disableSSLVerification()
Example to disable SSL verification
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.DisableSslVerification(true);
Example to disable SSL verification
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.DisableSslVerification = true;
Error handling
Speech to Text uses standard HTTP response codes to indicate whether a method completed successfully. HTTP response codes in the 2xx range indicate success. A response in the 4xx range is some sort of failure, and a response in the 5xx range usually indicates an internal system error that cannot be resolved by the user. Response codes are listed with the method.
ErrorResponse
Name | Description |
---|---|
error string |
Description of the problem. |
code integer |
HTTP response code. |
code_description string |
Response message. |
warnings string |
Warnings associated with the error. |
The Java SDK generates an exception for any unsuccessful method invocation. All methods that accept an argument can also throw an IllegalArgumentException
.
Exception | Description |
---|---|
IllegalArgumentException | An invalid argument was passed to the method. |
When the Java SDK receives an error response from the Speech to Text service, it generates an exception from the com.ibm.watson.developer_cloud.service.exception
package. All service exceptions contain the following fields.
Field | Description |
---|---|
statusCode | The HTTP response code that is returned. |
message | A message that describes the error. |
When the Node SDK receives an error response from the Speech to Text service, it creates an Error
object with information that describes the error that occurred. This error object is passed as the first parameter to the callback function for the method. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Python SDK generates an exception for any unsuccessful method invocation. When the Python SDK receives an error response from the Speech to Text service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
When the Ruby SDK receives an error response from the Speech to Text service, it generates an ApiException
with the following fields.
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
info | A dictionary of additional information about the error. |
The Go SDK generates an error for any unsuccessful service instantiation and method invocation. You can check for the error immediately. The contents of the error object are as shown in the following table.
Error
Field | Description |
---|---|
code | The HTTP response code that is returned. |
message | A message that describes the error. |
The Swift SDK returns a WatsonError
in the completionHandler
any unsuccessful method invocation. This error type is an enum that conforms to LocalizedError
and contains an errorDescription
property that returns an error message. Some of the WatsonError
cases contain associated values that reveal more information about the error.
Field | Description |
---|---|
errorDescription | A message that describes the error. |
When the .NET Standard SDK receives an error response from the Speech to Text service, it generates a ServiceResponseException
with the following fields.
Field | Description |
---|---|
Message | A message that describes the error. |
CodeDescription | The HTTP response code that is returned. |
When the Unity SDK receives an error response from the Speech to Text service, it generates an IBMError
with the following fields.
Field | Description |
---|---|
Url | The URL that generated the error. |
StatusCode | The HTTP response code returned. |
ErrorMessage | A message that describes the error. |
Response | The contents of the response from the server. |
ResponseHeaders | A dictionary of headers returned by the request. |
Example error handling
try {
// Invoke a method
} catch (NotFoundException e) {
// Handle Not Found (404) exception
} catch (RequestTooLargeException e) {
// Handle Request Too Large (413) exception
} catch (ServiceResponseException e) {
// Base class for all exceptions caused by error responses from the service
System.out.println("Service returned status code "
+ e.getStatusCode() + ": " + e.getMessage());
}
Example error handling
speechToText.method(params)
.catch(err => {
console.log('error:', err);
});
Example error handling
from ibm_watson import ApiException
try:
# Invoke a method
except ApiException as ex:
print "Method failed with status code " + str(ex.code) + ": " + ex.message
Example error handling
require "ibm_watson"
begin
# Invoke a method
rescue IBMWatson::ApiException => ex
print "Method failed with status code #{ex.code}: #{ex.error}"
end
Example error handling
import "github.com/watson-developer-cloud/go-sdk/speechtotextv1"
// Instantiate a service
speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options)
// Check for errors
if speechToTextErr != nil {
panic(speechToTextErr)
}
// Call a method
result, response, responseErr := speechToText.MethodName(&methodOptions)
// Check for errors
if responseErr != nil {
panic(responseErr)
}
Example error handling
speechToText.method() {
response, error in
if let error = error {
switch error {
case let .http(statusCode, message, metadata):
switch statusCode {
case .some(404):
// Handle Not Found (404) exception
print("Not found")
case .some(413):
// Handle Request Too Large (413) exception
print("Payload too large")
default:
if let statusCode = statusCode {
print("Error - code: \(statusCode), \(message ?? "")")
}
}
default:
print(error.localizedDescription)
}
return
}
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result)
}
Example error handling
try
{
// Invoke a method
}
catch(ServiceResponseException e)
{
Console.WriteLine("Error: " + e.Message);
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
Example error handling
// Invoke a method
speechToText.MethodName(Callback, Parameters);
// Check for errors
private void Callback(DetailedResponse<ExampleResponse> response, IBMError error)
{
if (error == null)
{
Log.Debug("ExampleCallback", "Response received: {0}", response.Response);
}
else
{
Log.Debug("ExampleCallback", "Error received: {0}, {1}, {3}", error.StatusCode, error.ErrorMessage, error.Response);
}
}
Additional headers
Some Watson services accept special parameters in headers that are passed with the request.
You can pass request header parameters in all requests or in a single request to the service.
To pass a request header, use the --header
(-H
) option with a curl request.
To pass header parameters with every request, use the setDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the addHeader
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the headers
parameter when you create the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, use the headers
method as a modifier on the request before you execute it.
To pass header parameters with every request, specify the set_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, include headers
as a dict
in the request.
To pass header parameters with every request, specify the add_default_headers
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the headers
method as a chainable method in the request.
To pass header parameters with every request, specify the SetDefaultHeaders
method of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, specify the Headers
as a map
in the request.
To pass header parameters with every request, add them to the defaultHeaders
property of the service object. See Data collection for an example use of this method.
To pass header parameters in a single request, pass the headers
parameter to the request method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it. See Data collection for an example use of this method.
To pass header parameters in a single request, use the WithHeader()
method as a modifier on the request before you execute it.
Example header parameter in a request
curl -X {request_method} -H "Request-Header: {header_value}" "{url}/v1/{method}"
Example header parameter in a request
ReturnType returnValue = speechToText.methodName(parameters)
.addHeader("Custom-Header", "{header_value}")
.execute();
Example header parameter in a request
const parameters = {
{parameters}
};
speechToText.methodName(
parameters,
headers: {
'Custom-Header': '{header_value}'
})
.then(result => {
console.log(response);
})
.catch(err => {
console.log('error:', err);
});
Example header parameter in a request
response = speech_to_text.methodName(
parameters,
headers = {
'Custom-Header': '{header_value}'
})
Example header parameter in a request
response = speech_to_text.headers(
"Custom-Header" => "{header_value}"
).methodName(parameters)
Example header parameter in a request
result, response, responseErr := speechToText.MethodName(
&methodOptions{
Headers: map[string]string{
"Accept": "application/json",
},
},
)
Example header parameter in a request
let customHeader: [String: String] = ["Custom-Header": "{header_value}"]
speechToText.methodName(parameters, headers: customHeader) {
response, error in
}
Example header parameter in a request
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.WithHeader("Custom-Header", "header_value");
Example header parameter in a request
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.WithHeader("Custom-Header", "header_value");
Response details
The Speech to Text service might return information to the application in response headers.
To access all response headers that the service returns, include the --include
(-i
) option with a curl request. To see detailed response data for the request, including request headers, response headers, and extra debugging information, include the --verbose
(-v
) option with the request.
Example request to access response headers
curl -X {request_method} {authentication_method} --include "{url}/v1/{method}"
To access information in the response headers, use one of the request methods that returns details with the response: executeWithDetails()
, enqueueWithDetails()
, or rxWithDetails()
. These methods return a Response<T>
object, where T
is the expected response model. Use the getResult()
method to access the response object for the method, and use the getHeaders()
method to access information in response headers.
Example request to access response headers
Response<ReturnType> response = speechToText.methodName(parameters)
.executeWithDetails();
// Access response from methodName
ReturnType returnValue = response.getResult();
// Access information in response headers
Headers responseHeaders = response.getHeaders();
All response data is available in the Response<T>
object that is returned by each method. To access information in the response
object, use the following properties.
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
speechToText.methodName(parameters)
.then(response => {
console.log(response.headers);
})
.catch(err => {
console.log('error:', err);
});
The return value from all service methods is a DetailedResponse
object. To access information in the result object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
get_result() |
Returns the response for the service-specific method. |
get_headers() |
Returns the response header information. |
get_status_code() |
Returns the HTTP status code. |
Example request to access response headers
speech_to_text.set_detailed_response(True)
response = speech_to_text.methodName(parameters)
# Access response from methodName
print(json.dumps(response.get_result(), indent=2))
# Access information in response headers
print(response.get_headers())
# Access HTTP response status
print(response.get_status_code())
The return value from all service methods is a DetailedResponse
object. To access information in the response
object, use the following properties.
DetailedResponse
Property | Description |
---|---|
result |
Returns the response for the service-specific method. |
headers |
Returns the response header information. |
status |
Returns the HTTP status code. |
Example request to access response headers
response = speech_to_text.methodName(parameters)
# Access response from methodName
print response.result
# Access information in response headers
print response.headers
# Access HTTP response status
print response.status
The return value from all service methods is a DetailedResponse
object. To access information in the response
object or response headers, use the following methods.
DetailedResponse
Method | Description |
---|---|
GetResult() |
Returns the response for the service-specific method. |
GetHeaders() |
Returns the response header information. |
GetStatusCode() |
Returns the HTTP status code. |
Example request to access response headers
import (
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/speechtotextv1"
)
result, response, responseErr := speechToText.MethodName(
&methodOptions{})
// Access result
core.PrettyPrint(response.GetResult(), "Result ")
// Access response headers
core.PrettyPrint(response.GetHeaders(), "Headers ")
// Access status code
core.PrettyPrint(response.GetStatusCode(), "Status Code ")
All response data is available in the WatsonResponse<T>
object that is returned in each method's completionHandler
.
Example request to access response headers
speechToText.methodName(parameters) {
response, error in
guard let result = response?.result else {
print(error?.localizedDescription ?? "unknown error")
return
}
print(result) // The data returned by the service
print(response?.statusCode)
print(response?.headers)
}
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
var results = speechToText.MethodName(parameters);
var result = results.Result; // The result object
var responseHeaders = results.Headers; // The response headers
var responseJson = results.Response; // The raw response JSON
var statusCode = results.StatusCode; // The response status code
The response contains fields for response headers, response JSON, and the status code.
DetailedResponse
Property | Description |
---|---|
Result |
Returns the result for the service-specific method. |
Response |
Returns the raw JSON response for the service-specific method. |
Headers |
Returns the response header information. |
StatusCode |
Returns the HTTP status code. |
Example request to access response headers
private void Example()
{
speechToText.MethodName(Callback, Parameters);
}
private void Callback(DetailedResponse<ResponseType> response, IBMError error)
{
var result = response.Result; // The result object
var responseHeaders = response.Headers; // The response headers
var responseJson = reresponsesults.Response; // The raw response JSON
var statusCode = response.StatusCode; // The response status code
}
Data labels
You can remove customer data if you associate the customer and the data when you send the information to a service. First, you label the data with a customer ID, and then you can delete the data by the ID.
Use the
X-Watson-Metadata
header to associate a customer ID with the data. By adding a customer ID to a request, you indicate that it contains data that belongs to that customer.Specify a random or generic string for the customer ID. Do not include personal data, such as an email address. Pass the string
customer_id={id}
as the argument of the header.- Use the Delete labeled data method to remove data that is associated with a customer ID.
Labeling data is used only by methods that accept customer data. For more information about Speech to Text and labeling data, see Information security.
For more information about how to pass headers, see Additional headers.
Data collection
By default, Speech to Text service instances that are not part of Premium plans log requests and their results. Logging is done only to improve the services for future users. The logged data is not shared or made public. Logging is disabled for services that are part of Premium plans.
To prevent IBM usage of your data for an API request, set the X-Watson-Learning-Opt-Out header parameter to true
.
You must set the header on each request that you do not want IBM to access for general service improvements.
You can set the header by using the setDefaultHeaders
method of the service object.
You can set the header by using the headers
parameter when you create the service object.
You can set the header by using the set_default_headers
method of the service object.
You can set the header by using the add_default_headers
method of the service object.
You can set the header by using the SetDefaultHeaders
method of the service object.
You can set the header by adding it to the defaultHeaders
property of the service object.
You can set the header by using the WithHeader()
method of the service object.
Example request
curl -u "apikey:{apikey}" -H "X-Watson-Learning-Opt-Out: true" "{url}/{method}"
Example request
Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");
speechToText.setDefaultHeaders(headers);
Example request
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const { IamAuthenticator } = require('ibm-watson/auth');
const speechToText = new SpeechToTextV1({
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
headers: {
'X-Watson-Learning-Opt-Out': 'true'
}
});
Example request
speech_to_text.set_default_headers({'x-watson-learning-opt-out': "true"})
Example request
speech_to_text.add_default_headers(headers: {"x-watson-learning-opt-out" => "true"})
Example request
import "net/http"
headers := http.Header{}
headers.Add("x-watson-learning-opt-out", "true")
speechToText.SetDefaultHeaders(headers)
Example request
speechToText.defaultHeaders["X-Watson-Learning-Opt-Out"] = "true"
Example request
IamAuthenticator authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
SpeechToTextService speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.WithHeader("X-Watson-Learning-Opt-Out", "true");
Example request
var authenticator = new IamAuthenticator(
apikey: "{apikey}"
);
while (!authenticator.CanAuthenticate())
yield return null;
var speechToText = new SpeechToTextService(authenticator);
speechToText.SetServiceUrl("{url}");
speechToText.WithHeader("X-Watson-Learning-Opt-Out", "true");
Synchronous and asynchronous requests
The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the ServiceCall interface.
- To call a method synchronously, use the
execute
method of theServiceCall
interface. You can call theexecute
method directly from an instance of the service. - To call a method asynchronously, use the
enqueue
method of theServiceCall
interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument providesonResponse
andonFailure
methods that you override to handle the callback.
The Ruby SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of service methods. All service methods implement the Concurrent::Async module. When you use the synchronous or asynchronous methods, an IVar object is returned. You access the DetailedResponse
object by calling ivar_object.value
.
For more information about the Ivar object, see the IVar class docs.
To call a method synchronously, either call the method directly or use the
.await
chainable method of theConcurrent::Async
module.Calling a method directly (without
.await
) returns aDetailedResponse
object.- To call a method asynchronously, use the
.async
chainable method of theConcurrent::Async
module.
You can call the .await
and .async
methods directly from an instance of the service.
Example synchronous request
ReturnType returnValue = speechToText.method(parameters).execute();
Example asynchronous request
speechToText.method(parameters).enqueue(new ServiceCallback<ReturnType>() {
@Override public void onResponse(ReturnType response) {
. . .
}
@Override public void onFailure(Exception e) {
. . .
}
});
Example synchronous request
response = speech_to_text.method_name(parameters)
or
response = speech_to_text.await.method_name(parameters)
Example asynchronous request
response = speech_to_text.async.method_name(parameters)
Recognize audio (WebSockets)
Sends audio and returns transcription results for recognition requests over a WebSocket connection. Requests and responses are enabled over a single TCP connection that abstracts much of the complexity of the request to offer efficient implementation, low latency, high throughput, and an asynchronous response.
The endpoint for the WebSocket API is
wss://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}/v1/recognize
{location}
indicates where your application is hosted:us-south
for Dallasus-east
for Washington, DCeu-de
for Frankfurtau-syd
for Sydneyjp-tok
for Tokyoeu-gb
for Londonkr-seo
for Seoul
{instance_id}
indicates the unique identifier of the service instance. For more information about how to find the instance ID, see Access between services.
The examples in the documentation abbreviate wss://api.{location}.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}
to {ws_url}
. So all WebSocket examples call the method as {ws_url}/v1/recognize
.
You can pass a maximum of 100 MB and a minimum of 100 bytes of audio per utterance (per recognition request). You can send multiple utterances over a single WebSocket connection. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.
By default, the service returns only final results for any request. To enable interim results, set the interim_results
interimResults
parameter to true
.
See also: The WebSocket interface.
The WebSocket interface cannot be called from curl. Use a client-side scripting language to call the interface. The example request uses JavaScript to invoke the WebSocket recognize
method.
The createRecognizeStream
method is deprecated. Use the equivalent recognizeUsingWebSocket
method instead.
The recognize_with_websocket
method is deprecated. Use the equivalent recognize_using_websocket
method instead.
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
content-type
contentType
content_type
parameter with the request to specify the format of the audio. - For all other formats, you can omit the
content-type
contentType
content_type
parameter or specifyapplication/octet-stream
with the parameter to have the service automatically detect the format of the audio.
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
application/octet-stream
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
See also: Audio formats.
The Python recognize_using_websocket
method requires the content_type
parameter.
URI /v1/recognize
okhttp3.WebSocket recognizeUsingWebSocket(RecognizeOptions options,
RecognizeCallback callback)
RecognizeStream recognizeUsingWebSocket(params)
dict recognize_using_websocket(audio, content_type,
recognize_callback, model=None,
language_customization_id=None, acoustic_customization_id=None,
customization_weight=None, base_model_version=None,
inactivity_timeout=None, interim_results=None,
keywords=None, keywords_threshold=None,
max_alternatives=None, word_alternatives_threshold=None,
word_confidence=None, timestamps=None, profanity_filter=None,
smart_formatting=None, speaker_labels=None, http_proxy_host=None,
http_proxy_port=None, customization_id=None, grammar_name=None,
redaction=None, processing_metrics=None, processing_metrics_interval=None,
audio_metrics=None, end_of_phrase_silence_time=None,
split_transcript_at_phrase_end=None, speech_detector_sensitivity=None,
background_audio_suppression=None, **kwargs)
WebSocketClient recognize_using_websocket(content_type:,
recognize_callback:, audio: nil, chunk_data: false, model: nil,
language_customization_id: nil, acoustic_customization_id: nil,
customization_weight: nil, base_model_version: nil,
inactivity_timeout: nil, interim_results: nil,
keywords: nil, keywords_threshold: nil,
max_alternatives: nil, word_alternatives_threshold: nil,
word_confidence: nil, timestamps: nil, profanity_filter: nil,
smart_formatting: nil, speaker_labels: nil, customization_id: nil,
grammar_name: nil, redaction: nil, processing_metrics: nil,
processing_metrics_interval: nil, audio_metrics: nil,
end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil,
speech_detector_sensitivity: nil, background_audio_suppression: nil)
Request
The client calls the recognize
method to obtain a string that contains the URI for the WebSocket interface. The call to the recognize
method sets basic parameters for the connection and for all recognition requests that are sent over it. See the Parameters of recognize method table.
The client then establishes a connection with the service by passing the URI to the WebSocket constructor, which returns a WebSocket
connection object. The client initiates and manages recognition requests by sending JSON-formatted text messages to the service over the connection. The text messages can include all other parameters of the recognition request. The required action
parameter tells the service which action is to be performed. See the Parameters of WebSocket text messages table.
After sending the text message to initiate a request, the client sends the audio data to be transcribed as a binary message (blob) over the connection.
Parameters of recognize method
-
Pass a valid IAM access token to establish an authenticated connection with the service. You pass an IAM access token instead of passing an API key with the call. You must establish the connection before the access token expires. For more information about obtaining an access token, see Authenticating with IAM tokens.
You pass an access token only to establish an authenticated connection. After you establish a connection, you can keep it alive indefinitely. You remain authenticated while you keep the connection open. You do not need to refresh the access token for an active connection that lasts beyond the token's expiration time.
-
The identifier of the model that is to be used for all recognition requests sent over the connection. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
-
The customization ID (GUID) of a custom language model that is to be used for all requests sent over the connection. The base model of the specified custom language model must match the model that is specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter. -
The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model that is specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See Custom models. -
The version of the specified base model that is to be used for all requests sent over the connection. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
-
Indicates whether IBM can use data that is sent over the connection to improve the service for future users. Specify
true
to prevent IBM from accessing the logged data. See Data collection.Default:
false
-
Associates a customer ID with all data that is passed over the connection. The parameter accepts the argument
customer_id={id}
, where{id}
is a random or generic string that is to be associated with the data. URL-encode the argument to the parameter, for examplecustomer_id%3dmy_ID
. By default, no customer ID is associated with the data. See Data labels. -
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with all requests sent over the connection. Do not specify both parameters with a request.
Call the recognizeUsingWebSocket
method to initiate a recognition request. Use the recognizeOptions
argument to pass a RecognizeOptions
object that provides the parameters for the request, including the audio. Use the callback
argument to pass a Java BaseRecognizeCallback
object to handle events from the WebSocket connection.
Call the recognizeUsingWebSocket
method to initiate a recognition request. The method returns a RecognizeStream
object to which you pipe the audio that is to be transcribed. You also use the object's on
method to define event handlers for the request. You pass all other parameters of the request as arguments of the method.
Call the recognize_using_websocket
method to initiate a recognition request. Pass the audio and all parameters of the request, including the RecognizeCallback
and AudioSource
objects, as arguments of the method.
Call the recognize_using_websocket
method to create a WebSocketClient
object. Pass the audio and all parameters of the request, including the RecognizeCallback
object, as arguments of the method.
Parameters of WebSocket text messages
Parameters
-
The action that is to be performed.
Allowable values:
-
start
initiates a recognition request. The message can also include any other optional parameters that are described in this table. After sending this text message, the client sends the data as a binary message (blob).Between recognition requests, the client can send new
start
messages to modify the parameters that are to be used for subsequent requests. By default, the service continues to use the parameters that were specified with the previousstart
message. -
stop
indicates that all audio data for the request has been sent to the service. The client can send additional requests with the same or different parameters.
-
-
Indicates how the
data
event handler is to return the response from the service:-
If
false
, the event handler returns only a string with the final transcription of the recognition results, regardless of the parameters that you pass with the request. You must set the encoding for your instance of theRecognizeStream
object to UTF-8 by including a call that is similar to the following line of code in your application:recognizeStream.setEncoding('utf8');
Do not include this call if you set the
objectMode
parameter totrue
. -
If
true
, the event handler returns the recognition results exactly as it receives them from the service: as one or more instances of aSpeechRecognitionResults
object.
For more information, see the Example request for the method.
-
-
The audio that is to be transcribed.
An
AudioSource
object that provides the audio that is to be transcribed. -
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
] -
A
BaseRecognizeCallback
object that implements theRecognizeCallback
interface to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.A
RecognizeCallback
object that defines methods to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application. -
The audio that is to be transcribed.
-
If
true
, theWebSocketClient
expects to receive data in chunks rather than as a single audio file. See Audio transmission.Default:
false
-
The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
-
The customization ID (GUID) of a custom language model that is to be used for the request. The base model of the specified custom language model must match the model that is specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
customizationId
parameter. -
The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model that is specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See Custom models. -
If you specify a customization ID when you open the connection, If you specify a customization ID, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when you set the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
-
The version of the specified base model that is to be used for the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
-
The time in seconds after which, if only silence (no speech) is detected in the audio, the connection is closed. The default is 30 seconds. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.Default:
30
-
If
true
, the service returns interim results as a stream of JSONSpeechRecognitionResults
objects. Iffalse
, the service returns a singleSpeechRecognitionResults
object with final results only. See Interim results. (See theobjectMode
parameter for information about controlling the response from the method.)Default:
false
-
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
-
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords. See Keyword spotting.
-
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.Default:
1
-
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
-
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, no word confidence measures are returned. See Word confidence.Default:
false
-
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
-
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
-
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, no smart formatting is performed. Applies to US English, Japanese, and Spanish transcription only. See Smart formatting.Default:
false
-
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, no speaker labels are returned. Specifyingtrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for that parameter.Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only. See Speaker labels.
Default:
false
-
If you are passing requests through a proxy, specify the hostname of the proxy server. Use the
http_proxy_port
parameter to specify the port number at which the proxy listens. Omit both parameters if you are not using a proxy.Default:
None
-
If you are passing requests through a proxy, specify the port number at which the proxy service listens. Use the
http_proxy_host
parameter to specify the hostname of the proxy. Omit both parameters if you are not using a proxy.Default:
None
-
Deprecated. Use the
language_customization_id
languageCustomizationId
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the request. Do not specify both parameters with a request. -
The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
languageCustomizationId
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars. -
If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
keywordsThreshold
parameters) and returns only a single final transcript (forces themax_alternatives
maxAlternatives
parameter to be1
).Applies to US English, Japanese, and Korean transcription only. See Numeric redaction.
Default:
false
-
If
true
, requests processing metrics about the service's transcription of the input audio. The service returns processing metrics at the interval that is specified by theprocessing_metrics_interval
processingMetricsInterval
parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics. See Processing metrics.Default:
false
-
Specifies the interval in seconds at which the service is to return processing metrics. The parameter is ignored unless the
processing_metrics
processingMetrics
parameter is set totrue
.The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.
The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.
See Processing metrics.
Default:
1.0
-
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics. See Audio metrics.Default:
false
-
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
-
A value greater than 0 specifies the interval that the service is to use for speech recognition.
-
A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds. The default for Chinese is 0.6 seconds.
See End of phrase silence time.
Default:
0.8
-
-
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval. The feature is generally available for US English and UK English only.See Split transcript at phrase end.
Default:
false
-
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
-
0.0 suppresses all audio (no speech is transcribed).
-
0.5 (the default) provides a reasonable compromise for the level of sensitivity.
-
1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
Default:
0.5
-
-
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value between 0.0 and 1.0:
-
0.0 (the default) provides no suppression (background audio suppression is disabled).
-
0.5 provides a reasonable level of audio suppression for general usage.
-
1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
Default:
0.0
-
Example request
var IAM_access_token = '{access_token}';
var wsURI = '{ws_url}/v1/recognize'
+ '?access_token=' + IAM_access_token
+ '&model=en-US_BroadbandModel';
var websocket = new WebSocket(wsURI);
websocket.onopen = function(evt) { onOpen(evt) };
websocket.onclose = function(evt) { onClose(evt) };
websocket.onmessage = function(evt) { onMessage(evt) };
websocket.onerror = function(evt) { onError(evt) };
function onOpen(evt) {
var message = {
action: 'start',
keywords: ['colorado', 'tornado', 'tornadoes'],
keywords_threshold: 0.5,
max-alternatives: 3
};
websocket.send(JSON.stringify(message));
// Prepare and send the audio file.
websocket.send(blob);
websocket.send(JSON.stringify({action: 'stop'}));
}
function onClose(evt) {
console.log(evt.data);
}
function onMessage(evt) {
console.log(evt.data);
}
function onError(evt) {
console.log(evt.data);
}
Example request
IamAuthenticator authenticator = new IamAuthenticator("{apikey}");
SpeechToText speechToText = new SpeechToText(authenticator);
speechToText.setServiceUrl("{url}");
try {
RecognizeOptions recognizeOptions = new RecognizeOptions.Builder()
.audio(new FileInputStream("audio-file.flac"))
.contentType("audio/flac")
.model("en-US_BroadbandModel")
.keywords(Arrays.asList("colorado", "tornado", "tornadoes"))
.keywordsThreshold((float) 0.5)
.maxAlternatives(3)
.build();
BaseRecognizeCallback baseRecognizeCallback =
new BaseRecognizeCallback() {
@Override
public void onTranscription
(SpeechRecognitionResults speechRecognitionResults) {
System.out.println(speechRecognitionResults);
}
@Override
public void onDisconnected() {
System.exit(0);
}
};
speechToText.recognizeUsingWebSocket(recognizeOptions,
baseRecognizeCallback);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Example request
const fs = require('fs');
const { IamAuthenticator } = require('ibm-watson/auth');
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1');
const speechToText = new SpeechToTextV1({
authenticator: new IamAuthenticator({
apikey: '{apikey}',
}),
serviceUrl: '{url}',
});
const params = {
objectMode: true,
contentType: 'audio/flac',
model: 'en-US_BroadbandModel',
keywords: ['colorado', 'tornado', 'tornadoes'],
keywordsThreshold: 0.5,
maxAlternatives: 3,
};
// Create the stream.
const recognizeStream = speechToText.recognizeUsingWebSocket(params);
// Pipe in the audio.
fs.createReadStream('audio-file.flac').pipe(recognizeStream);
/*
* Uncomment the following two lines of code ONLY if `objectMode` is `false`.
*
* WHEN USED TOGETHER, the two lines pipe the final transcript to the named
* file and produce it on the console.
*
* WHEN USED ALONE, the following line pipes just the final transcript to
* the named file but produces numeric values rather than strings on the
* console.
*/
// recognizeStream.pipe(fs.createWriteStream('transcription.txt'));
/*
* WHEN USED ALONE, the following line produces just the final transcript
* on the console.
*/
// recognizeStream.setEncoding('utf8');
// Listen for events.
recognizeStream.on('data', function(event) { onEvent('Data:', event); });
recognizeStream.on('error', function(event) { onEvent('Error:', event); });
recognizeStream.on('close', function(event) { onEvent('Close:', event); });
// Display events on the console.
function onEvent(name, event) {
console.log(name, JSON.stringify(event, null, 2));
};
Example request
import json
from os.path import join, dirname
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback, AudioSource
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('{apikey}')
speech_to_text = SpeechToTextV1(
authenticator=authenticator
)
speech_to_text.set_service_url('{url}')
class MyRecognizeCallback(RecognizeCallback):
def __init__(self):
RecognizeCallback.__init__(self)
def on_data(self, data):
print(json.dumps(data, indent=2))
def on_error(self, error):
print('Error received: {}'.format(error))
def on_inactivity_timeout(self, error):
print('Inactivity timeout: {}'.format(error))
myRecognizeCallback = MyRecognizeCallback()
with open(join(dirname(__file__), './.', 'audio-file.flac'),
'rb') as audio_file:
audio_source = AudioSource(audio_file)
speech_to_text.recognize_using_websocket(
audio=audio_source,
content_type='audio/flac',
recognize_callback=myRecognizeCallback,
model='en-US_BroadbandModel',
keywords=['colorado', 'tornado', 'tornadoes'],
keywords_threshold=0.5,
max_alternatives=3)
Example request
require "ibm_watson/authenticators"
require "ibm_watson/speech_to_text_v1"
require "ibm_watson/websocket/recognize_callback"
include IBMWatson
authenticator = Authenticators::IamAuthenticator.new(
apikey: "{apikey}"
)
speech_to_text = SpeechToTextV1.new(
authenticator: authenticator
)
speech_to_text.service_url = "{url}"
class MyRecognizeCallback < IBMWatson::RecognizeCallback
def initialize
super
end
def on_error(error:)
puts "Error received: #{error}"
end
def on_inactivity_timeout(error:)
puts "Inactivity timeout: #{error}"
end
def on_data(data:)
puts data.to_s
end
end
mycallback = MyRecognizeCallback.new
File.open(Dir.getwd + "/resources/speech.wav") do |audio_file|
speech_to_text.recognize_using_websocket(
audio: audio_file,
recognize_callback: mycallback,
content_type: "audio/wav"
).start
end
Response
Successful recognition returns one or more instances of a SpeechRecognitionResults
object. The contents of the response depend on the parameters you send with the recognition request, including the interim_results
interimResults
parameter. For more information, see the results for the Recognize audio method.
If the objectMode
parameter is true
, successful recognition returns one or more instances of a SpeechRecognitionResults
object. The contents of the response depend on the parameters you send with the recognition request, including the interimResults
parameter. For more information, see the results for the Recognize audio method.
If the objectMode
parameter is false
, successful recognition returns only a single string with the final transcription results.
Response handling
Response handling for the WebSocket interface is different from HTTP response handling. The WebSocket
constructor returns an instance of a WebSocket connection object. You assign application-specific calls to the following methods of the object to handle events that are associated with the connection. Each event handler must accept a single argument for an event from the connection. The event that it accepts causes it to execute.
Methods
-
The status of the connection's opening.
-
Response messages from the service, including the results of the request as one or more JSON
SpeechRecognitionResults
objects. -
Errors for the connection or request.
-
The status of the connection's closing.
The callback
parameter of the recognizeUsingWebSocket
method accepts a Java object of type BaseRecognizeCallback
, which implements the RecognizeCallback
interface to handle events from the WebSocket connection. You override the definitions of the following default empty methods of the object to handle events that are associated with the connection and the request. The methods are called when their associated events occur.
Methods
-
The WebSocket connection is established.
-
The service is listening for audio.
-
-
Final results for the request have been returned by the service.
-
An error occurs in the WebSocket connection.
-
An inactivity timeout occurs for the request.
-
The WebSocket connection is closed.
You handle events that are associated with the WebSocket connection and the request by defining event-handler methods on the RecognizeCallback
object that is returned by the recognizeUsingWebSocket
method. The methods are called when their associated events occur. You can define handlers for the following events by using the object's on
method. For more information about streams and events, see the Node.js documentation.
Events
-
Results for the request are received on the stream.
-
Data is available to be read from the stream.
-
-
The WebSocket connection is closed.
-
An error occurs in the WebSocket connection.
The recognize_callback
parameter of the recognize_using_websocket
method accepts an object of type RecognizeCallback
. The object defines the methods that handle events from the WebSocket connection. You can override the definitions of the following default empty methods of the object to handle events that are associated with the connection and the request. The methods are called when their associated events occur.
Methods
-
The WebSocket connection is established.
-
The service is listening for audio.
-
-
Returns interim results or maximum alternatives from the service when those responses are requested.
-
Returns final transcription results for the request from the service.
-
The service has returned final results for the request.
-
Reports an error in the WebSocket connection.
-
Reports an inactivity timeout for the request.
The connection can produce the following return codes.
Return code
-
The connection closed normally.
-
The connection closed because the remote peer is leaving.
-
The connection closed due to a protocol error.
-
The connection closed because the service could not process the input from the client.
-
Reserved response code.
-
The connection closed for a reason other than those defined by the remaining return codes.
-
The connection closed abnormally.
-
The connection closed because the service received invalid data.
-
The connection closed due to a policy violation.
-
The connection closed because the frame size exceeded the 4 MB limit.
-
The connection closed because the client requested a required extension that is not available.
-
The connection closed because the service encountered an unexpected internal condition that prevents it from fulfilling the request.
-
The connection was not established due to a TLS handshake error.
Example response
{
"results": [
{
"final": true,
"alternatives": [
{
"transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
"confidence": 0.89
},
{
"transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
},
{
"transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
}
],
"keywords_result": {
"tornadoes": [
{
"normalized_text": "tornadoes",
"start_time": 1.52,
"end_time": 2.15,
"confidence": 1.0
}
],
"colorado": [
{
"normalized_text": "Colorado",
"start_time": 4.95,
"end_time": 5.59,
"confidence": 0.98
}
]
}
}
],
"result_index": 0
}
Methods
List models
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
Lists all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things. The ordering of the list of models can change from call to call; do not rely on an alphabetized or static list of models.
See also: Languages and models.
GET /v1/models
ListModels()
(speechToText *SpeechToTextV1) ListModels(listModelsOptions *ListModelsOptions) (result *SpeechModels, response *core.DetailedResponse, err error)
(speechToText *SpeechToTextV1) ListModelsWithContext(ctx context.Context, listModelsOptions *ListModelsOptions) (result *SpeechModels, response *core.DetailedResponse, err error)
ServiceCall<SpeechModels> listModels()
listModels(params)
list_models(self,
**kwargs
) -> DetailedResponse
list_models
func listModels(
headers: [String: String]? = nil,
completionHandler: @escaping (WatsonResponse<SpeechModels>?, WatsonError?) -> Void)
ListModels(Callback<SpeechModels> callback)
Request
No Request Parameters
No Request Parameters
WithContext method only
A context.Context instance that you can use to specify a timeout for the operation or to cancel an in-flight request.
No Request Parameters
No Request Parameters
No Request Parameters
No Request Parameters
No Request Parameters
No Request Parameters
No Request Parameters
curl -X GET -u "apikey:{apikey}" "{url}/v1/models"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); SpeechToTextService speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); var result = speechToText.ListModels(); Console.WriteLine(result.Response);
package main import ( "encoding/json" "fmt" "github.com/IBM/go-sdk-core/core" "github.com/watson-developer-cloud/go-sdk/speechtotextv1" ) func main() { authenticator := &core.IamAuthenticator{ ApiKey: "{apikey}", } options := &speechtotextv1.SpeechToTextV1Options{ Authenticator: authenticator, } speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options) if speechToTextErr != nil { panic(speechToTextErr) } speechToText.SetServiceURL("{url}") result, response, responseErr := speechToText.ListModels( &speechtotextv1.ListModelsOptions{}, ) if responseErr != nil { panic(responseErr) } b, _ := json.MarshalIndent(result, "", " ") fmt.Println(string(b)) }
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); SpeechToText speechToText = new SpeechToText(authenticator); speechToText.setServiceUrl("{url}"); SpeechModels speechModels = speechToText.listModels().execute().getResult(); System.out.println(speechModels);
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const speechToText = new SpeechToTextV1({ authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); speechToText.listModels() .then(speechModels => { console.log(JSON.stringify(speechModels, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import SpeechToTextV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') speech_to_text = SpeechToTextV1( authenticator=authenticator ) speech_to_text.set_service_url('{url}') speech_models = speech_to_text.list_models().get_result() print(json.dumps(speech_models, indent=2))
require "json" require "ibm_watson/authenticators" require "ibm_watson/speech_to_text_v1" include IBMWatson authenticator = Authenticators::IamAuthenticator.new( apikey: "{apikey}" ) speech_to_text = SpeechToTextV1.new( authenticator: authenticator ) speech_to_text.service_url = "{url}" speech_models = speech_to_text.list_models puts JSON.pretty_generate(speech_models.result)
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}") let speechToText = SpeechToText(authenticator: authenticator) speechToText.serviceURL = "{url}" speechToText.listModels() { response, error in guard let models = response?.result else { print(error?.localizedDescription ?? "unknown error") return } print(models) }
var authenticator = new IamAuthenticator( apikey: "{apikey}" ); while (!authenticator.CanAuthenticate()) yield return null; var speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); SpeechModels listModelsResponse = null; speechToText.ListModels( callback: (DetailedResponse<SpeechModels> response, IBMError error) => { Log.Debug("SpeechToTextServiceV1", "ListModels result: {0}", response.Response); listModelsResponse = response.Result; } ); while (listModelsResponse == null) { yield return null; }
Response
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supportedFeatures
A brief description of the model.
models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supportedFeatures
A brief description of the model.
models
Information about the available language models.
An array of
SpeechModel
objects that provides information about each available model.The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Models
Status Code
OK. The request succeeded.
Not Acceptable. The request specified an
Accept
header with an incompatible content type.Unsupported Media Type. The request specified an unacceptable media type.
{ "models": [ { "name": "pt-BR_NarrowbandModel", "language": "pt-BR", "url": "{url}/v1/models/pt-BR_NarrowbandModel", "rate": 8000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "Brazilian Portuguese narrowband model." }, { "name": "ko-KR_BroadbandModel", "language": "ko-KR", "url": "{url}/models/ko-KR_BroadbandModel", "rate": 16000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "Korean broadband model." }, { "name": "fr-FR_BroadbandModel", "language": "fr-FR", "url": "{url}/v1/models/fr-FR_BroadbandModel", "rate": 16000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "French broadband model." } ] }
{ "models": [ { "name": "pt-BR_NarrowbandModel", "language": "pt-BR", "url": "{url}/v1/models/pt-BR_NarrowbandModel", "rate": 8000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "Brazilian Portuguese narrowband model." }, { "name": "ko-KR_BroadbandModel", "language": "ko-KR", "url": "{url}/models/ko-KR_BroadbandModel", "rate": 16000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "Korean broadband model." }, { "name": "fr-FR_BroadbandModel", "language": "fr-FR", "url": "{url}/v1/models/fr-FR_BroadbandModel", "rate": 16000, "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "French broadband model." } ] }
Get a model
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
Gets information for a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.
See also: Languages and models.
GET /v1/models/{model_id}
GetModel(string modelId)
(speechToText *SpeechToTextV1) GetModel(getModelOptions *GetModelOptions) (result *SpeechModel, response *core.DetailedResponse, err error)
(speechToText *SpeechToTextV1) GetModelWithContext(ctx context.Context, getModelOptions *GetModelOptions) (result *SpeechModel, response *core.DetailedResponse, err error)
ServiceCall<SpeechModel> getModel(GetModelOptions getModelOptions)
getModel(params)
get_model(self,
model_id: str,
**kwargs
) -> DetailedResponse
get_model(model_id:)
func getModel(
modelID: String,
headers: [String: String]? = nil,
completionHandler: @escaping (WatsonResponse<SpeechModel>?, WatsonError?) -> Void)
GetModel(Callback<SpeechModel> callback, string modelId)
Request
Instantiate the GetModelOptions
struct and set the fields to provide parameter values for the GetModel
method.
Use the GetModelOptions.Builder
to create a GetModelOptions
object that contains the parameter values for the getModel
method.
Path Parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.)Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
WithContext method only
A context.Context instance that you can use to specify a timeout for the operation or to cancel an in-flight request.
The GetModel options.
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
The getModel options.
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
parameters
The identifier of the model in the form of its name from the output of the Get a model method. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.).Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]
curl -X GET -u "apikey:{apikey}" "{url}/v1/models/en-US_BroadbandModel"
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); SpeechToTextService speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); var result = speechToText.GetModel( modelId: "en-US_BroadbandModel" ); Console.WriteLine(result.Response);
package main import ( "encoding/json" "fmt" "github.com/IBM/go-sdk-core/core" "github.com/watson-developer-cloud/go-sdk/speechtotextv1" ) func main() { authenticator := &core.IamAuthenticator{ ApiKey: "{apikey}", } options := &speechtotextv1.SpeechToTextV1Options{ Authenticator: authenticator, } speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options) if speechToTextErr != nil { panic(speechToTextErr) } speechToText.SetServiceURL("{url}") result, response, responseErr := speechToText.GetModel( &speechtotextv1.GetModelOptions{ ModelID: core.StringPtr("en-US_BroadbandModel"), }, ) if responseErr != nil { panic(responseErr) } b, _ := json.MarshalIndent(result, "", " ") fmt.Println(string(b)) }
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); SpeechToText speechToText = new SpeechToText(authenticator); speechToText.setServiceUrl("{url}"); GetModelOptions getModelOptions = new GetModelOptions.Builder() .modelId("en-US_BroadbandModel") .build(); SpeechModel speechModel = speechToText.getModel(getModelOptions).execute().getResult(); System.out.println(speechModel);
const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const speechToText = new SpeechToTextV1({ authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const getModelParams = { modelId: 'en-US_BroadbandModel', }; speechToText.getModel(getModelParams) .then(speechModel => { console.log(JSON.stringify(speechModel, null, 2)); }) .catch(err => { console.log('error:', err); });
import json from ibm_watson import SpeechToTextV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') speech_to_text = SpeechToTextV1( authenticator=authenticator ) speech_to_text.set_service_url('{url}') speech_model = speech_to_text.get_model('en-US_BroadbandModel').get_result() print(json.dumps(speech_model, indent=2))
require "json" require "ibm_watson/authenticators" require "ibm_watson/speech_to_text_v1" include IBMWatson authenticator = Authenticators::IamAuthenticator.new( apikey: "{apikey}" ) speech_to_text = SpeechToTextV1.new( authenticator: authenticator ) speech_to_text.service_url = "{url}" speech_model = speech_to_text.get_model( model_id: "en-US_BroadbandModel" ) puts JSON.pretty_generate(speech_model.result)
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}") let speechToText = SpeechToText(authenticator: authenticator) speechToText.serviceURL = "{url}" speechToText.getModel(modelID: "en-US_BroadbandModel") { response, error in guard let model = response?.result else { print(error?.localizedDescription ?? "unknown error") return } print(model) }
var authenticator = new IamAuthenticator( apikey: "{apikey}" ); while (!authenticator.CanAuthenticate()) yield return null; var speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); speechToText.GetModel( callback: (DetailedResponse<SpeechModel> response, IBMError error) => { Log.Debug("SpeechToTextServiceV1", "GetModel result: {0}", response.Response); getModelResponse = response.Result; }, modelId: "en-US_BroadbandModel" ); while (getModelResponse == null) { yield return null; }
Response
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supportedFeatures
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supported_features
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
supportedFeatures
A brief description of the model.
Information about an available language model.
The name of the model for use as an identifier in calls to the service (for example,
en-US_BroadbandModel
).The language identifier of the model (for example,
en-US
).The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
The URI for the model.
Additional service features that are supported with the model.
Indicates whether the customization interface can be used to create a custom language model based on the language model.
Indicates whether the
speaker_labels
parameter can be used with the language model.Note: The field returns
true
for all models. However, speaker labels are supported only for US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model only). Speaker labels are not supported for any other models.
SupportedFeatures
A brief description of the model.
Status Code
OK. The request succeeded.
Not Found. The specified
model_id
was not found.Not Acceptable. The request specified an
Accept
header with an incompatible content type.Unsupported Media Type. The request specified an unacceptable media type.
{ "rate": 16000, "name": "en-US_BroadbandModel", "language": "en-US", "url": "{url}/v1/models/en-US_BroadbandModel", "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "US English broadband model." }
{ "rate": 16000, "name": "en-US_BroadbandModel", "language": "en-US", "url": "{url}/v1/models/en-US_BroadbandModel", "supported_features": { "custom_language_model": true, "speaker_labels": true }, "description": "US English broadband model." }
Recognize audio
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
Sends audio and returns transcription results for a recognition request. You can pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The method returns only final results; to enable interim results, use the WebSocket API. (With the curl
command, use the --data-binary
option to upload the file for the request.)
See also: Making a basic HTTP request.
Streaming mode
For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the service closes the connection (status code 408) if it does not receive at least 15 seconds of audio (including silence) in any 30-second period. The service also closes the connection (status code 400) if it detects no speech for inactivity_timeout
seconds of streaming audio; use the inactivity_timeout
parameter to change the default of 30 seconds.
See also:
Audio formats (content types)
The service accepts audio in the following formats (MIME types).
- For formats that are labeled Required, you must use the
Content-Type
header with the request to specify the format of the audio. - For all other formats, you can omit the
Content-Type
header or specifyapplication/octet-stream
with the header to have the service automatically detect the format of the audio. (With thecurl
command, you can specify either"Content-Type:"
or"Content-Type: application/octet-stream"
.)
Where indicated, the format that you specify must include the sampling rate and can optionally include the number of channels and the endianness of the audio.
audio/alaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/basic
(Required. Use only with narrowband models.)audio/flac
audio/g729
(Use only with narrowband models.)audio/l16
(Required. Specify the sampling rate (rate
) and optionally the number of channels (channels
) and endianness (endianness
) of the audio.)audio/mp3
audio/mpeg
audio/mulaw
(Required. Specify the sampling rate (rate
) of the audio.)audio/ogg
(The service automatically detects the codec of the input audio.)audio/ogg;codecs=opus
audio/ogg;codecs=vorbis
audio/wav
(Provide audio with a maximum of nine channels.)audio/webm
(The service automatically detects the codec of the input audio.)audio/webm;codecs=opus
audio/webm;codecs=vorbis
The sampling rate of the audio must match the sampling rate of the model for the recognition request: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the request fails.
See also: Audio formats.
Multipart speech recognition
Note: The Watson SDKs do not support multipart speech recognition.
The HTTP POST
method of the service also supports multipart speech recognition. With multipart requests, you pass all audio data as multipart form data. You specify some parameters as request headers and query parameters, but you pass JSON metadata as form data to control most aspects of the transcription. You can use multipart recognition to pass multiple audio files with a single request.
Use the multipart approach with browsers for which JavaScript is disabled or when the parameters used with the request are greater than the 8 KB limit imposed by most HTTP servers and proxies. You can encounter this limit, for example, if you want to spot a very large number of keywords.
See also: Making a multipart HTTP request.
POST /v1/recognize
Recognize(System.IO.MemoryStream audio, string contentType = null, string model = null, string languageCustomizationId = null, string acousticCustomizationId = null, string baseModelVersion = null, double? customizationWeight = null, long? inactivityTimeout = null, List<string> keywords = null, float? keywordsThreshold = null, long? maxAlternatives = null, float? wordAlternativesThreshold = null, bool? wordConfidence = null, bool? timestamps = null, bool? profanityFilter = null, bool? smartFormatting = null, bool? speakerLabels = null, string customizationId = null, string grammarName = null, bool? redaction = null, bool? audioMetrics = null, double? endOfPhraseSilenceTime = null, bool? splitTranscriptAtPhraseEnd = null, float? speechDetectorSensitivity = null, float? backgroundAudioSuppression = null)
(speechToText *SpeechToTextV1) Recognize(recognizeOptions *RecognizeOptions) (result *SpeechRecognitionResults, response *core.DetailedResponse, err error)
(speechToText *SpeechToTextV1) RecognizeWithContext(ctx context.Context, recognizeOptions *RecognizeOptions) (result *SpeechRecognitionResults, response *core.DetailedResponse, err error)
ServiceCall<SpeechRecognitionResults> recognize(RecognizeOptions recognizeOptions)
recognize(params)
recognize(self,
audio: BinaryIO,
*,
content_type: str = None,
model: str = None,
language_customization_id: str = None,
acoustic_customization_id: str = None,
base_model_version: str = None,
customization_weight: float = None,
inactivity_timeout: int = None,
keywords: List[str] = None,
keywords_threshold: float = None,
max_alternatives: int = None,
word_alternatives_threshold: float = None,
word_confidence: bool = None,
timestamps: bool = None,
profanity_filter: bool = None,
smart_formatting: bool = None,
speaker_labels: bool = None,
customization_id: str = None,
grammar_name: str = None,
redaction: bool = None,
audio_metrics: bool = None,
end_of_phrase_silence_time: float = None,
split_transcript_at_phrase_end: bool = None,
speech_detector_sensitivity: float = None,
background_audio_suppression: float = None,
**kwargs
) -> DetailedResponse
recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil)
func recognize(
audio: Data,
contentType: String? = nil,
model: String? = nil,
languageCustomizationID: String? = nil,
acousticCustomizationID: String? = nil,
baseModelVersion: String? = nil,
customizationWeight: Double? = nil,
inactivityTimeout: Int? = nil,
keywords: [String]? = nil,
keywordsThreshold: Double? = nil,
maxAlternatives: Int? = nil,
wordAlternativesThreshold: Double? = nil,
wordConfidence: Bool? = nil,
timestamps: Bool? = nil,
profanityFilter: Bool? = nil,
smartFormatting: Bool? = nil,
speakerLabels: Bool? = nil,
customizationID: String? = nil,
grammarName: String? = nil,
redaction: Bool? = nil,
audioMetrics: Bool? = nil,
endOfPhraseSilenceTime: Double? = nil,
splitTranscriptAtPhraseEnd: Bool? = nil,
speechDetectorSensitivity: Double? = nil,
backgroundAudioSuppression: Double? = nil,
headers: [String: String]? = nil,
completionHandler: @escaping (WatsonResponse<SpeechRecognitionResults>?, WatsonError?) -> Void)
Recognize(Callback<SpeechRecognitionResults> callback, System.IO.MemoryStream audio, string contentType = null, string model = null, string languageCustomizationId = null, string acousticCustomizationId = null, string baseModelVersion = null, double? customizationWeight = null, long? inactivityTimeout = null, List<string> keywords = null, float? keywordsThreshold = null, long? maxAlternatives = null, float? wordAlternativesThreshold = null, bool? wordConfidence = null, bool? timestamps = null, bool? profanityFilter = null, bool? smartFormatting = null, bool? speakerLabels = null, string customizationId = null, string grammarName = null, bool? redaction = null, bool? audioMetrics = null, double? endOfPhraseSilenceTime = null, bool? splitTranscriptAtPhraseEnd = null, float? speechDetectorSensitivity = null, float? backgroundAudioSuppression = null)
Request
Instantiate the RecognizeOptions
struct and set the fields to provide parameter values for the Recognize
method.
Use the RecognizeOptions.Builder
to create a RecognizeOptions
object that contains the parameter values for the recognize
method.
Custom Headers
Set to
chunked
to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service. See Audio transmission.Allowable values: [
chunked
]The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]
Query Parameters
The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.Default:
30
An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.Default:
1
A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
Default:
0.8
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
Default:
0.5
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
Default:
0
The audio to transcribe.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
WithContext method only
A context.Context instance that you can use to specify a timeout for the operation or to cancel an in-flight request.
The Recognize options.
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
The recognize options.
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
parameters
The audio to transcribe.
The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.
Allowable values: [
application/octet-stream
,audio/alaw
,audio/basic
,audio/flac
,audio/g729
,audio/l16
,audio/mp3
,audio/mpeg
,audio/mulaw
,audio/ogg
,audio/ogg;codecs=opus
,audio/ogg;codecs=vorbis
,audio/wav
,audio/webm
,audio/webm;codecs=opus
,audio/webm;codecs=vorbis
]The identifier of the model that is to be used for the recognition request. (Note: The model
ar-AR_BroadbandModel
is deprecated; usear-MS_BroadbandModel
instead.) See Languages and models.Allowable values: [
ar-AR_BroadbandModel
,ar-MS_BroadbandModel
,de-DE_BroadbandModel
,de-DE_NarrowbandModel
,en-AU_BroadbandModel
,en-AU_NarrowbandModel
,en-GB_BroadbandModel
,en-GB_NarrowbandModel
,en-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
,es-AR_BroadbandModel
,es-AR_NarrowbandModel
,es-CL_BroadbandModel
,es-CL_NarrowbandModel
,es-CO_BroadbandModel
,es-CO_NarrowbandModel
,es-ES_BroadbandModel
,es-ES_NarrowbandModel
,es-MX_BroadbandModel
,es-MX_NarrowbandModel
,es-PE_BroadbandModel
,es-PE_NarrowbandModel
,fr-CA_BroadbandModel
,fr-CA_NarrowbandModel
,fr-FR_BroadbandModel
,fr-FR_NarrowbandModel
,it-IT_BroadbandModel
,it-IT_NarrowbandModel
,ja-JP_BroadbandModel
,ja-JP_NarrowbandModel
,ko-KR_BroadbandModel
,ko-KR_NarrowbandModel
,nl-NL_BroadbandModel
,nl-NL_NarrowbandModel
,pt-BR_BroadbandModel
,pt-BR_NarrowbandModel
,zh-CN_BroadbandModel
,zh-CN_NarrowbandModel
]Default:
en-US_BroadbandModel
The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See Custom models.Note: Use this parameter instead of the deprecated
customization_id
parameter.The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the
model
parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See Custom models.The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.
If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.
Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.
The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
See Custom models.
The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use
-1
for infinity. See Inactivity timeout.An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.
You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.
See Keyword spotting.
A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See Keyword spotting.
The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of
0
, the service uses the default value,1
. See Maximum alternatives.A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See Word alternatives.
If
true
, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See Word confidence.Default:
false
If
true
, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.Default:
false
If
true
, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter tofalse
to return results with no censoring. Applies to US English transcription only. See Profanity filtering.Default:
true
If
true
, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.Note: Applies to US English, Japanese, and Spanish transcription only.
See Smart formatting.
Default:
false
If
true
, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Settingspeaker_labels
totrue
forces thetimestamps
parameter to betrue
, regardless of whether you specifyfalse
for the parameter.Note: Applies to US English, Australian English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.
See Speaker labels.
Default:
false
Deprecated. Use the
language_customization_id
parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the
language_customization_id
parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.If
true
, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with anX
character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the
keywords
andkeywords_threshold
parameters) and returns only a single final transcript (forces themax_alternatives
parameter to be1
).Note: Applies to US English, Japanese, and Korean transcription only.
See Numeric redaction.
Default:
false
If
true
, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.See Audio metrics.
Default:
false
If
true
, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.Specify a value for the pause interval in the range of 0.0 to 120.0.
- A value greater than 0 specifies the interval that the service is to use for speech recognition.
- A value of 0 indicates that the service is to use the default interval. It is equivalent to omitting the parameter.
The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.
If
true
, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.Default:
false
The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.
Specify a value between 0.0 and 1.0:
- 0.0 suppresses all audio (no speech is transcribed).
- 0.5 (the default) provides a reasonable compromise for the level of sensitivity.
- 1.0 suppresses no audio (speech detection sensitivity is disabled).
The values increase on a monotonic curve. See Speech Activity Detection.
The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.
Specify a value in the range of 0.0 to 1.0:
- 0.0 (the default) provides no suppression (background audio suppression is disabled).
- 0.5 provides a reasonable level of audio suppression for general usage.
- 1.0 suppresses all audio (no audio is transcribed).
The values increase on a monotonic curve. See Speech Activity Detection.
curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary @audio-file2.flac "{url}/v1/recognize?word_alternatives_threshold=0.9&keywords=colorado%2Ctornado%2Ctornadoes&keywords_threshold=0.5"
Download sample file audio-file2.flac
IamAuthenticator authenticator = new IamAuthenticator( apikey: "{apikey}" ); SpeechToTextService speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); var result = speechToText.Recognize( audio: File.ReadAllBytes("audio-file2.flac"), contentType: "audio/flac", wordAlternativesThreshold: 0.9f, keywords: new List<string>() { "colorado", "tornado", "tornadoes" }, keywordsThreshold: 0.5f ); Console.WriteLine(result.Response);
Download sample file audio-file2.flac
package main import ( "encoding/json" "fmt" "io" "os" "github.com/IBM/go-sdk-core/core" "github.com/watson-developer-cloud/go-sdk/speechtotextv1" ) func main() { authenticator := &core.IamAuthenticator{ ApiKey: "{apikey}", } options := &speechtotextv1.SpeechToTextV1Options{ Authenticator: authenticator, } speechToText, speechToTextErr := speechtotextv1.NewSpeechToTextV1(options) if speechToTextErr != nil { panic(speechToTextErr) } speechToText.SetServiceURL("{url}") files := [2]string{"audio-file1.flac", "audio-file2.flac"} for _, file := range files { var audioFile io.ReadCloser var audioFileErr error audioFile, audioFileErr = os.Open("./." + file) if audioFileErr != nil { panic(audioFileErr) } result, response, responseErr := speechToText.Recognize( &speechtotextv1.RecognizeOptions{ Audio: audioFile, ContentType: core.StringPtr("audio/flac"), Timestamps: core.BoolPtr(true), WordAlternativesThreshold: core.Float32Ptr(0.9), Keywords: []string{"colorado", "tornado", "tornadoes"}, KeywordsThreshold: core.Float32Ptr(0.5), }, ) if responseErr != nil { panic(responseErr) } b, _ := json.MarshalIndent(result, "", " ") fmt.Println(string(b)) } }
Download sample file audio-file2.flac
IamAuthenticator authenticator = new IamAuthenticator("{apikey}"); SpeechToText speechToText = new SpeechToText(authenticator); speechToText.setServiceUrl("{url}"); try { RecognizeOptions recognizeOptions = new RecognizeOptions.Builder() .audio(new FileInputStream("audio-file2.flac")) .contentType("audio/flac") .wordAlternativesThreshold((float) 0.9) .keywords(Arrays.asList("colorado", "tornado", "tornadoes")) .keywordsThreshold((float) 0.5) .build(); SpeechRecognitionResults speechRecognitionResults = speechToText.recognize(recognizeOptions).execute().getResult(); System.out.println(speechRecognitionResults); } catch (FileNotFoundException e) { e.printStackTrace(); } }
Download sample file audio-file2.flac
const fs = require('fs'); const SpeechToTextV1 = require('ibm-watson/speech-to-text/v1'); const { IamAuthenticator } = require('ibm-watson/auth'); const speechToText = new SpeechToTextV1({ authenticator: new IamAuthenticator({ apikey: '{apikey}', }), serviceUrl: '{url}', }); const recognizeParams = { audio: fs.createReadStream('audio-file2.flac'), contentType: 'audio/flac', wordAlternativesThreshold: 0.9, keywords: ['colorado', 'tornado', 'tornadoes'], keywordsThreshold: 0.5, }; speechToText.recognize(recognizeParams) .then(speechRecognitionResults => { console.log(JSON.stringify(speechRecognitionResults, null, 2)); }) .catch(err => { console.log('error:', err); });
Download sample file audio-file2.flac
import json from os.path import join, dirname from ibm_watson import SpeechToTextV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator authenticator = IAMAuthenticator('{apikey}') speech_to_text = SpeechToTextV1( authenticator=authenticator ) speech_to_text.set_service_url('{url}') with open(join(dirname(__file__), './.', 'audio-file2.flac'), 'rb') as audio_file: speech_recognition_results = speech_to_text.recognize( audio=audio_file, content_type='audio/flac', word_alternatives_threshold=0.9, keywords=['colorado', 'tornado', 'tornadoes'], keywords_threshold=0.5 ).get_result() print(json.dumps(speech_recognition_results, indent=2))
Download sample file audio-file2.flac
require "json" require "ibm_watson/authenticators" require "ibm_watson/speech_to_text_v1" include IBMWatson authenticator = Authenticators::IamAuthenticator.new( apikey: "{apikey}" ) speech_to_text = SpeechToTextV1.new( authenticator: authenticator ) speech_to_text.service_url = "{url}" File.open("audio-file2.flac") do |audio_file| speech_recognition_results = speech_to_text.recognize( audio: audio_file, content_type: "audio/flac", word_alternatives_threshold: 0.9, keywords: ["colorado", "tornado", "tornadoes"], keywords_threshold: 0.5 ) puts JSON.pretty_generate(speech_recognition_results.result) end
Download sample file audio-file2.flac
let authenticator = WatsonIAMAuthenticator(apiKey: "{apikey}") let speechToText = SpeechToText(authenticator: authenticator) speechToText.serviceURL = "{url}" let url = Bundle.main.url(forResource: "audio-file2", withExtension: "flac") var audio = try! Data(contentsOf: url!) speechToText.recognize( audio: audio, keywords: ["colorado", "tornado", "tornadoes"], keywordsThreshold: 0.5, wordAlternativesThreshold: 0.90, contentType: "audio/flac") { response, error in guard let results = response?.result else { print(error?.localizedDescription ?? "unknown error") return } print(results) }
Download sample file audio-file2.flac
var authenticator = new IamAuthenticator( apikey: "{apikey}" ); while (!authenticator.CanAuthenticate()) yield return null; var speechToText = new SpeechToTextService(authenticator); speechToText.SetServiceUrl("{url}"); SpeechRecognitionResults recognizeResponse = null; speechToText.Recognize( callback: (DetailedResponse<SpeechRecognitionResults> response, IBMError error) => { Log.Debug("SpeechToTextServiceV1", "Recognize result: {0}", response.Response); recognizeResponse = response.Result; }, audio: File.ReadAllBytes("audio-file2.flac"), contentType: "audio/flac", wordAlternativesThreshold: 0.9f, keywords: new List<string>() { "colorado", "tornado", "tornadoes" }, keywordsThreshold: 0.5f ); while (recognizeResponse == null) { yield return null; }
Download sample file audio-file2.flac
Response
The complete results for a speech recognition request.
An array of
SpeechRecognitionResult
objects that can include interim and final results (interim results are returned only if supported by the method). Final results are guaranteed not to change; interim results might be replaced by further interim results and final results. The service periodically sends updates to the results list; theresult_index
is set to the lowest index in the array that has changed; it is incremented for new results.An index that indicates a change point in the
results
array. The service increments the index only for additional results that it sends for new audio for the same request.An array of
SpeakerLabelsResult
objects that identifies which words were spoken by which speakers in a multi-person exchange. The array is returned only if thespeaker_labels
parameter istrue
. When interim results are also requested for methods that support them, it is possible for aSpeechRecognitionResults
object to include only thespeaker_labels
field.If processing metrics are requested, information about the service's processing of the input audio. Processing metrics are not available with the synchronous Recognize audio method.
If audio metrics are requested, information about the signal characteristics of the input audio.
An array of warning messages associated with the request:
- Warnings for invalid parameters or fields can include a descriptive message and a list of invalid argument strings, for example,
"Unknown arguments:"
or"Unknown url query arguments:"
followed by a list of the form"{invalid_arg_1}, {invalid_arg_2}."
- The following warning is returned if the request passes a custom model that is based on an older version of a base model for which an updated version is available:
"Using previous version of base model, because your custom model has been built with it. Please note that this version will be supported only for a limited time. Consider updating your custom model to the new base model. If you do not do that you will be automatically switched to base model when you used the non-updated custom model."
In both cases, the request succeeds despite the warnings.
- Warnings for invalid parameters or fields can include a descriptive message and a list of invalid argument strings, for example,
The complete results for a speech recognition request.
An array of
SpeechRecognitionResult
objects that can include interim and final results (interim results are returned only if supported by the method). Final results are guaranteed not to change; interim results might be replaced by further interim results and final results. The service periodically sends updates to the results list; theresult_index
is set to the lowest index in the array that has changed; it is incremented for new results.An indication of whether the transcription results are final. If
true
, the results for this utterance are not updated further; no additional results are sent for aresult_index
once its results are indicated as final.An array of alternative transcripts. The
alternatives
array can include additional requested output such as word confidence or timestamps.A transcription of the audio.
A score that indicates the service's confidence in the transcript in the range of 0.0 to 1.0. A confidence score is returned only for the best alternative and only with results marked as final.
Constraints: 0 ≤ value ≤ 1
Time alignments for each word from the transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in seconds, for example:
[["hello",0.0,1.2],["world",1.2,2.5]]
. Timestamps are returned only for the best alternative.A confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0.0 to 1.0, for example:
[["hello",0.95],["world",0.866]]
. Confidence scores are returned only for the best alternative and only with results marked as final.
Alternatives
A dictionary (or associative array) whose keys are the strings specified for
keywords
if both that parameter andkeywords_threshold
are specified. The value for each key is an array of matches spotted in the audio for that keyword. Each match is described by aKeywordResult
object. A keyword for which no matches are found is omitted from the dictionary. The dictionary is omitted entirely if no matches are found for any keywords.
Results