IBM Cloud Docs
Release notes for Speech to Text for IBM Cloud Pak for Data

Release notes for Speech to Text for IBM Cloud Pak for Data

IBM Cloud Pak for Data

The following features and changes were included for each release and update of installed or on-premises instances of IBM Watson® Speech to Text for IBM Cloud Pak for Data. Unless otherwise noted, all changes are compatible with earlier releases and are automatically and transparently available to all new and existing applications.

For information about known limitations of the service, see Known limitations.

For information about releases and updates of the service for IBM Cloud, see Release notes for Speech to Text for IBM Cloud.

2 May 2023 (Version 4.6.5)

Version 4.6.5 is now available

Speech to Text for IBM Cloud Pak for Data version 4.6.5 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.10 and 4.12. For more information, see Watson Speech services on IBM Cloud Pak for Data.

New Japanese next-generation telephony model

The service now offers a next-generation telephony model for Japanese: ja-JP_Telephony. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see

Improved language model customization for next-generation English and Japanese models

The service now provides improved language model customization for next-generation English and Japanese models:

  • en-AU_Multimedia
  • en-AU_Telephony
  • en-IN_Telephony
  • en-GB_Multimedia
  • en-GB_Telephony
  • en-US_Multimedia
  • en-US_Telephony
  • ja-JP_Multimedia
  • ja-JP_Telephony

Visible improvements to the models: The new technology improves the default behavior of the new English and Japanese models. Among other changes, the new technology optimizes the default behavior for the following parameters:

  • The default customization_weight for custom models that are based on the new versions of these models changes from 0.2 to 0.1.
  • The default character_insertion_bias for custom models that are based on the new versions of these models remains 0.0, but the models have changed in a manner that makes use of the parameter for speech recognition less necessary.

Upgrading to the new models: To take advantage of the improved technology, you must upgrade any custom language models that are based on the new models. To upgrade to the new version of one of these base models, do the following:

  1. Change your custom model by adding or modifying a custom word, corpus, or grammar that the model contains. Any change that you make moves the model to the ready state.

  2. Use the POST /v1/customizations/{customization_id}/train method to retrain the model. Retraining upgrades the custom model to the new technology and moves the model to the available state.

    Known issue: At this time, you cannot use the POST /v1/customizations/{customization_id}/upgrade_model method to upgrade a custom model to one of the new base models. This issue will be addressed in a future release.

Using the new models: Following the upgrade to the new base model, you are advised to evaluate the performance of the upgraded custom model by paying special attention to the customization_weight and character_insertion_bias parameters for speech recognition. When you retrain your custom model:

  • The custom model uses the new default customization_weight of 0.1 for your custom model. A non-default customization_weight that you had associated with your custom model is removed.
  • The custom model might no longer require use of the character_insertion_bias parameter for optimal speech recognition.

Improvements to language model customization render these parameters less important for high-quality speech recognition:

  • If you use the default values for these parameters, continue to do so after the upgrade. The default values will likely continue to offer the best results for speech recognition.
  • If you specify non-default values for these parameters, experiment with the default values following upgrade. Your custom model might work well for speech recognition with the default values.

If you feel that using different values for these parameters might improve speech recognition with your custom model, experiment with incremental changes to determine whether the parameters are needed to improve speech recognition.

Note: At this time, the improvements to language model customization apply only to custom models that are based on the next-generation English or Japanese base language models listed earlier. Over time, the improvements will be made available for other next-generation language models.

More information: For more information about upgrading and about speech recognition with these parameters, see

New environment variable for Speech services custom resource

The documentation now includes instructions to create an environment variable named ${CUSTOM_RESOURCE_SPEECH}. You append the new variable to the cpd_vars.sh script, and source the script to use the variable in your environment. For more information, see Information you need to complete this task in Installing Watson Speech services, or refer to any of the upgrade topics for the Speech services.

Defect fix: The Swedish telephony and Italian multimedia models are now available

Defect fix: The Swedish telephony (sv-SE_Telephony) and Italian multimedia (it-IT_Multimedia) models are now available for installation. Previously, they were not available.

Defect fix: Improved training time for next-generation custom language models

Defect fix: Training time for next-generation custom language models is now significantly improved. Previously, training time took much longer than necessary, as reported for training of Japanese custom language models. The problem was corrected by an internal fix.

Defect fix: Grammar files now handle strings of digits correctly

Defect fix: When grammars are used, the service now handles longer strings of digits correctly. Previously, it was failing to complete recognition or returning incorrect results.

Defect fix: Dynamically generated grammar files now work properly

Defect fix: Dynamically generated grammar files now work properly. Previously, dynamic grammar files could cause internal failures, as reported for integration of Speech to Text with IBM® watsonx™ Assistant. The problem was corrected by an internal fix.

Defect fix: Smart formatting for US English dates is now correct

Defect fix: Smart formatting now correctly includes days of the week and dates when both are present in the spoken audio, for example, Tuesday February 28. Previously, in some cases the day of the week was omitted and the date was presented incorrectly. Note that smart formatting is beta functionality.

Defect fix: Update documentation for speech hesitation words for next-generation models

Defect fix: Documentation for speech hesitation words for next-generation models has been updated. More details are provided about US English and Japanese hesitation words. Next-generation models include the actual hesitation words in transcription results, unlike previous-generation models, which include only hesitation markers. For more information, see Speech hesitations and hesitation markers.

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

29 March 2023 (Version 4.6.4)

Version 4.6.4 is now available
Speech to Text for IBM Cloud Pak for Data version 4.6.4 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.10 and 4.12. For more information, see Watson Speech services on IBM Cloud Pak for Data.
Important: Back up your data before upgrading to version 4.6.3 or 4.6.4
Important: Before upgrading to Watson Speech services version 4.6.3 or 4.6.4, you must make a backup of your data. Preserve the backup in a safe location. For more information about backing up your Watson Speech services data, see Backing up and restoring Watson Speech services data in Administering Watson Speech services. That topic also includes information about restoring your data if that becomes necessary.
Known issue: The Swedish telephony and Italian multimedia models are not yet available
Known issue: The Swedish telephony (sv-SE_Telephony) and Italian multimedia (it-IT_Multimedia) models are not yet available. They will be made available with version 4.6.5.
Defect fix: You can now change the installed models and voices with the advanced installation options
Defect fix: During installation, you can now specify different models or voices with the advanced installation options of the command-line interface. Previously, the service always installed the default models and voices. The limitation continues to apply for Watson Speech services versions 4.6.0, 4.6.2, and 4.6.3. For information about installing models and voices, see Specifying additional installation options in Installing Watson Speech services.
Setting load balancer timeouts
Watson Speech services require that you change the load balancer timeout settings for both the server and client to 300 seconds. These settings ensure that long-running speech recognition requests, those with long or difficult audio, have sufficient time to complete. For more information, see Information you need to complete this task in Installing Watson Speech services.
Security vulnerabilities addressed
The following security vulnerabilities have been fixed:

23 February 2023 (Version 4.6.3)

Version 4.6.3 is now available

Speech to Text for IBM Cloud Pak for Data version 4.6.3 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift version 4.10. Red Hat OpenShift version 4.8 is no longer supported. For more information, see Watson Speech services on IBM Cloud Pak for Data.

Important: All previous-generation models are deprecated and will reach end of service on 31 July 2023

Important: All previous-generation models are deprecated and will reach end of service effective 31 July 2023. On that date, all previous-generation models will be removed from the service and the documentation. The previous deprecation date was 3 March 2023. The new date allows users more time to migrate to the appropriate next-generation models. But users must migrate to the equivalent next-generation model by 31 July 2023.

Most previous-generation models were deprecated on 15 March 2022. Previously, the Arabic and Japanese models were not deprecated. Deprecation now applies to all previous-generation models.

Note: When the previous-generation en-US_BroadbandModel is removed from service, the next-generation en-US_Multimedia model will become the default model for speech recognition requests.

Known issue: You cannot change the installed models and voices with the advanced installation options

Known issue: You currently cannot specify different models or voices with the advanced installation options. The service always installs the default models and voices. For information about changing the models after installation, see Updating models and voices for your Watson Speech services in the Administration topic of Watson Speech services on IBM Cloud Pak for Data.

Known issue: Upgrade to version 4.6.3 can fail to complete

Known issue: When upgrading to version 4.6.3, the MinIO backup job can fail to be deleted upon completion. If this happens, the solution is to delete the job, after which the upgrade proceeds normally. Perform the following steps to resolve the problem.

  1. To determine whether the MinIO backup job remains undeleted, issue the following command:

    oc get job --namespace {${PROJECT_CPD_INSTANCE} | grep speech-cr-ibm-minio-backup
    

    The MinIO job that is not deleted is identified by an entry of the following form:

    speech-cr-ibm-minio-backup   1/1   3m25s   1d
    
  2. To delete the MinIO backup job, issue the following command:

    oc delete job speech-cr-ibm-minio-backup --namespace ${PROJECT_CPD_INSTANCE}
    

Once the backup job is deleted, upgrade continues and completes.

Defect fix: Update French Canadian next-generation telephony model (upgrade required)

Defect fix: The French Canadian next-generation telephony model, fr-CA_Telephony, was updated to address an internal inconsistency that could cause an error during speech recognition. You need to upgrade any custom models that are based on the fr-CA_Telephony model. For more information about upgrading custom models, see

Defect fix: The next-generation Brazilian Portuguese multimedia model is now available

Defect fix: The next-generation Brazilian Portuguese multimedia model is now available for Speech to Text for IBM Cloud Pak for Data. Previously, the model was unavailable.

Adding words directly to custom models that are based on next-generation models increases the training time

Adding custom words directly to a custom model that is based on a next-generation model causes training of a model to take a few minutes longer than it otherwise would. If you are training a model with custom words that you added by using the POST /v1/customizations/{customization_id}/words or PUT /v1/customizations/{customization_id}/words/{word_name} method, allow for some minutes of extra training time for the model. For more information, see

Additional information about working with service instances

The documentation now includes information about creating a service instance with the command-line interface (cpl-cli) and about managing service instances. For more information, see the following topics of Watson Speech services on IBM Cloud Pak for Data:

  • Creating a Watson Speech services instance under Post-installation setup
  • Managing your Watson Speech services instances under Administering
Security vulnerability addressed

The following security vulnerability has been fixed:

30 January 2023 (Version 4.6.2)

Version 4.6.2 is now available

Speech to Text for IBM Cloud Pak for Data version 4.6.2 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.8 and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.

The custom resource now includes a new fileStorageClass property

The custom resource for the Watson Speech services now includes a fileStorageClass property in addition to the existing blockStorageClass property. You specify both block and file storage classes when you install or upgrade a service. During upgrade from a previous version, the new property is added automatically to the custom resource by the --file_storage_class option on cli manage apply-cr command.

For more information about the available block and file storage classes you use with each of the supported storage solutions, see the table of Storage requirements under Information you need to complete this task on the page "Installing Watson Speech services" in Watson Speech services on IBM Cloud Pak for Data.

Additional information about provisioning a service instance

The documentation now includes information about creating a service instance programmatically. It also includes examples of listing service instances and deleting a service instance. For more information, see Creating a Watson Speech services instance in the Post-installation setup documentation in Watson Speech services on IBM Cloud Pak for Data.

Server-side encryption is enabled for the MinIO datastore

The Speech services have now enabled server-side encryption for object storage in the MinIO datastore. No action is required on your part.

Change to audit webhooks

The Speech services have now removed the audit webhook dependency. The services now write audit events directly to the server. After upgrading to version 4.6.2, some webhook resources might remain until all services can remove the dependency. The remaining resources will be removed in a future release. No action is required on your part.

New Netherlands Dutch next-generation multimedia model

The service now offers a next-generation multimedia model for Netherlands Dutch: nl-NL_Multimedia. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see

New Swedish next-generation telephony model

The service now offers a next-generation telephony model for Swedish: sv-SE_Telephony. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see

Updates to English next-generation telephony models

The English next-generation telephony models have been updated for improved speech recognition:

  • en-AU_Telephony
  • en-GB_Telephony
  • en-IN_Telephony
  • en-US_Telephony

All of these models continue to support low latency. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.

The max_alternatives parameter is now available for use with next-generation models

The max_alternatives parameter is now available for use with all next-generation models. The parameter is generally available for all next-generation models. For more information, see Maximum alternatives.

Defect fix: Allow use of both max_alternatives and end_of_phrase_silence_time parameters with next-generation models

Defect fix: When you use both the max_alternatives and end_of_phrase_silence_time parameters in the same request with next-generation models, the service now returns multiple alternative transcripts while also respecting the indicated pause interval. Previously, use of the two parameters in a single request generated a failure. (Use of the max_alternatives parameter with next-generation models was previously available as an experimental feature to a limited number of customers.)

Defect fix: Update to Japanese next-generation multimedia model (upgrade required)

Defect fix: The Japanese next-generation multimedia model, ja-JP_Multimedia, was updated to address an internal inconsistency that could cause an error during speech recognition with low latency. You need to upgrade any custom models that are based on the ja-JP_Multimedia model. For more information about upgrading custom models, see

Defect fix: Add documentation guidelines for creating Japanese sounds-likes based on next-generation models

Defect fix: In sounds-likes for Japanese custom language models that are based on next-generation models, the character-sequence ウー is ambiguous in some left contexts. Do not use characters (syllables) that end with the phoneme /o/, such as and . In such cases, use ウウ or just instead of ウー. For example, use ロウウマン or ロウマン instead of ロウーマン. For more information, see Guidelines for Japanese.

Defect fix: Correct use of display_as field in transcription results

Defect fix: For language model customization with next-generation models, the value of the display_as field for a custom word now appears in all transcripts. Previously, the value of the word field sometimes appeared in transcription results.

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

30 November 2022 (Version 4.6.0)

Version 4.6.0 is now available

Speech to Text for IBM Cloud Pak for Data version 4.6.0 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.8 and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.

Amazon Web Services (AWS) is now supported

Watson Speech services for IBM Cloud Pak for Data are now supported on Amazon Web Services™ (AWS™). The services support Amazon Elastic Block Store, which you specify by setting the blockStorageClass property of the Speech services custom resource to gp2-csi or gp3-csi.

New storage classes are now supported

Watson Speech services for IBM Cloud Pak for Data now support two additional storage classes:

  • IBM Cloud Block Storage (ibmc-block-gold)
  • NetApp Trident (ontap-nas)

You specify the storage class with the blockStorageClass property of the Speech services custom resource. For more information about all supported storage classes, see the following topics in Watson Speech services on IBM Cloud Pak for Data:

  • Before you begin in Installing Watson Speech services
  • Specifying a storage class in Using the Watson Speech services custom resource
Known issue: Some Watson Speech services pods do not have annotations that are used for scheduling

Known issue: Some Watson Speech services pods are missing the cloudpakInstanceId annotation. If you use the IBM Cloud Pak for Data scheduling service, any Watson Speech services pods without the cloudpakInstanceId annotation are

  • Scheduled by the default Kubernetes scheduler rather than the scheduling service
  • Not included in the quota enforcement
Monitoring of the PostgreSQL datastore is now available

You can now enable monitoring of the PostgreSQL datastore to receive updates on its usage and status by the Watson Speech services. The events can be consumed by Prometheus monitoring software or whatever application you use for monitoring. By enabling monitoring for user-defined projects in addition to the default platform monitoring, you can monitor your own projects with the Red Hat® OpenShift® Container Platform monitoring stack. This capability includes an additional property, spec.global.datastores.postgressql.enablePodMonitor, in the Speech services custom resource.

For more information, see the topic Monitoring the PostgreSQL datastore for Watson Speech services in the Administering section of Watson Speech services on IBM Cloud Pak for Data.

Defect fix: PostgreSQL datastore is no longer installed if only runtime microservices are enabled

Defect fix: The PostgreSQL datastore is no longer installed if only the runtime microservices are enabled. The datastore is now installed only if at least one of the sttAsync, sttCustomization, or ttsCustomization microservices is installed. PostgreSQL is not uninstalled if at a later date these microservices are disabled.

Prior to version 4.6.0, PostgreSQL was always installed with the Speech services. If you are an existing customer who used only the runtime microservices of the Speech services prior to version 4.6.0, PostgreSQL remains installed but is not used. In this case, installation of PostgreSQL persists across upgrades.

The MinIO datastore is always installed because the runtime microservices depend on it. The RabbitMQ datastore is installed only if the sttAsync microservice is installed.

For more information, see Datastore properties in Using the Watson Speech services custom resource in Watson Speech services on IBM Cloud Pak for Data.

Defect fix: Creation of a Network Policy is no longer necessary for the PostgreSQL operator to monitor its operands

Defect fix: For version 4.6.0, it is not necessary to create a Network Policy to allow the PostgreSQL operator to monitor its operands, as described in the 10 November 2022 (Versions 4.0.x and 4.5.x) service update. As of version 4.6.0, the service handles this situation automatically.

Defect fix: Some next-generation models were updated to improve low-latency response time

Defect fix: The following next-generation models were updated to improve their response time when the low_latency parameter is used:

  • en-IN_Telephony
  • hi-IN_Telephony
  • it-IT_Multimedia
  • nl-NL_Telephony

Previously, these models did not return recognition results as quickly as expected when the low_latency parameter was used. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.

Defect fix: Improve custom model naming documentation

Defect fix: The documentation now provides detailed rules for naming custom language models and custom acoustic models. For more information, see

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

10 November 2022 (Versions 4.0.x and 4.5.x)

Known issue: Updated Network Policy needed for PostgreSQL operator

Known issue: For Speech services version 4.0.x (not including version 4.0.0) and 4.5.x, if the PostgreSQL operator and the Speech services are installed in different namespaces, the PostgreSQL operator is not able to monitor the PostgreSQL operands for the Speech services. The operator is prevented from monitoring the operands by the Network Policy that is in place for the Speech services.

This problem does not prevent the PostgreSQL cluster from functioning properly. The cluster remains active and fully functional. However, the operator is not able to update the operands when you upgrade to new versions of the Speech services.

The solution for the problem is to create an additional Network Policy for the PostgreSQL operator, as shown in the following steps. You can perform the steps regardless of whether the PostgreSQL operator is installed in the same namespace as the Speech services or in a different namespace.

  1. Log in as an administrator of the Red Hat® OpenShift® project where the Speech services are installed.

  2. Enter the following command to update the Network Policy for the Speech services:

    cat << EOF | oc apply -f -
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      labels:
        app.kubernetes.io/component: stt
        app.kubernetes.io/instance: {{ <custom-resource-name> }}
        app.kubernetes.io/name: speech-to-text
        release: {{ <custom-resource-name> }}
      name: <custom-resource-name>-postgres-network-policy
      namespace: {{ <cpd-instance-namespace> }}
    spec:
      ingress:
      - from:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              app.kubernetes.io/name: cloud-native-postgresql
    EOF
    

    where

    • <custom-resource-name> is the name of the Speech services custom resource. The recommended name for version 4.0.x is speech-prod-cr; the recommended name for version 4.5.x is speech-cr.
    • <cpd-instance-name> is the name of the project (namespace) in which the Speech services are installed. The documentation uses the environment variable ${PROJECT_CPD_INSTANCE} to identity the namespace.
  3. To verify that the updated Network Policy allows the operator to monitor the operands and that the PostgreSQL cluster is in a healthy state, enter the following command, where <custom-resource-name> and <cpd-instance-name> are the values you used in the previous step:

    oc -get cluster {{ <custom-resource-name> }}-postgres -n {{ <cpd-instance-namespace> }}
    

    If the PostgreSQL cluster is functioning properly, the command produces output similar to the following:

    NAME                 AGE   INSTANCES   READY   STATUS                     PRIMARY
    speech-cr-postgres   14d   3           3       Cluster in healthy state   speech-cr-postgres-1
    

These steps do not cause operator to update the operands to the latest versions. However, the operands are upgraded as expected when you next upgrade the Speech services software.

13 October 2022 (Version 4.5.3)

Version 4.5.3 is now available

Speech to Text for IBM Cloud Pak for Data version 4.5.3 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.

Audit events are available for the Speech services

The IBM Cloud Pak for Data Audit Logging Service generates and forwards audit events for both the Speech to Text and Text to Speech services. The audit events match those that are available for Activity Tracker with the public service. For more information, see Audit events.

You cannot uninstall individual Speech service components

The documentation now notes that you cannot uninstall individual service components (microservices) once they are installed. To remove any of the following components, you must uninstall the Watson Speech services in their entirety and reinstall only the components that you need: Speech to Text runtime, Speech to Text asynchronous HTTP, Speech to Text customization, Text to Speech runtime, and Text to Speech customization. For more information about installing the Speech services, see Watson Speech services on IBM Cloud Pak for Data.

New French Canadian next-generation multimedia model

The service now offers a next-generation multimedia model for French Canadian: fr-CA_Multimedia. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see

Updates to English next-generation telephony models

The English next-generation telephony models have been updated for improved speech recognition:

  • en-AU_Telephony
  • en-GB_Telephony
  • en-IN_Telephony
  • en-US_Telephony

All of these models continue to support low latency. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.

Italian next-generation multimedia model now supports low latency

The Italian next-generation multimedia model, it-IT_Multimedia, now supports low latency. For more information about next-generation models and low latency, see

Troubleshooting upgrade from version 4.0.x to version 4.5.x

When you upgrade the Speech services from version 4.0.x to version 4.5.x, you might encounter an issue where the PostgreSQL pods become stuck in the Terminating state. If this problem occurs during your upgrade, perform the following steps to resolve the problem. The information and steps are also documented in Upgrading Watson Speech services from Version 4.0 to Version 4.5 in the Upgrading topic of Watson Speech services on IBM Cloud Pak for Data.

  1. Use the following command to identify pods that remain in the Terminating state:
oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | awk {'print $1'}
  1. Use the following command to set the environment variable pods to include the list of pods that remain in the Terminating state:
pods=$(oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | awk {'print $1'})
  1. Use the following command to delete the stuck pods so that the upgrade process can continue:
pods=$(oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | grep Terminating | awk {'print $1'})
Defect fix: Fix custom resource entries documentation

Defect fix: The documentation for the Speech services custom resource now includes colons after the names of the models koKrTelephony and nlNlTelephony. Previously, the documentation for these two entries omitted the colons.

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

19 August 2022 (Version 4.5.1)

Important: Deprecation date for most previous-generation models is now 3 March 2023

Superseded: This deprecation notice is superseded by the 23 February 2023 service update. The end of service date for all previous-generation models is now 31 July 2023.

On 15 March 2022, the previous-generation models for all languages other than Arabic and Japanese were deprecated. At that time, the deprecated models were to remain available until 15 September 2022. To allow users more time to migrate to the appropriate next-generation models, the deprecated models will now remain available until 3 March 2023. As with the initial deprecation notice, the Arabic and Japanese previous-generation models are not deprecated. For complete list of all deprecated models, see the 15 March 2022 (Version 4.0.6) service update.

On 3 March 2023, the deprecated models will be removed from the service and the documentation. If you use any of the deprecated models, you must migrate to the equivalent next-generation model by the 3 March 2023.

Note: When the previous-generation en-US_BroadbandModel is removed from service, the next-generation en-US_Multimedia model will become the default model for speech recognition requests.

3 August 2022 (Version 4.5.1)

Version 4.5.1 is now available

Speech to Text for IBM Cloud Pak for Data version 4.5.1 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.

Support for FIPS-enabled clusters

Both Speech to Text for IBM Cloud Pak for Data and Text to Speech for IBM Cloud Pak for Data now support running on Federal Information Processing Standard (FIPS)-enabled clusters. For more information, see Services that support FIPS.

Defect fix: Fix ephemeral storage calculations to prevent occasional pod evictions

Defect fix: A defect was fixed and calculation of ephemeral storage limits is now more precise for the Speech to Text for IBM Cloud Pak for Data and Text to Speech for IBM Cloud Pak for Data runtimes. These changes prevent occasional pod evictions when the services' runtimes are under heavy load.

Defect fix: Update speech hesitations and hesitation markers documentation

Defect fix: Documentation for speech hesitations and hesitation markers has been updated. Previous-generation models include hesitation markers in place of speech hesitations in transcription results for most languages; smart formatting removes hesitation markers from US English final transcripts. Next-generation models include the actual speech hesitations in transcription results; smart formatting has no effect on their inclusion in final transcription results.

For more information, see:

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

29 June 2022 (Version 4.5.0)

Version 4.5.0 is now available

Speech to Text for IBM Cloud Pak for Data version 4.5.0 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.

Unified Speech services for IBM Cloud Pak for Data documentation

The installation and administration documentation for both Speech to Text and Text to Speech is now combined in the IBM Cloud Pak for Data documentation. For more information about installing and managing the Speech services, see Watson Speech services on IBM Cloud Pak for Data.

Changes to Speech services custom resource

The custom resource is now created when you initially install the Speech services. The process is described in the IBM Cloud Pak for Data installation documentation. The content of the custom resource has changed:

  • The recommended name of the custom resource has changed from speech-prod-cr to speech-cr.
  • All references to storage class have changed from variants of storageClass to blockStorageClass.
  • The name of the Portworx block storage class has changed from portworx-shared-gp3 to portworx-db-gp3-sc.
  • The createSecret property has been removed for the MinIO and PostgreSQl datastores. The property is only used internally. The Speech services always use a secrets object if you create one, and they always automatically create the object if none is provided.
User-provided secrets object now supported for RabbitMQ datastore

You can now provide security credentials for the RabbitMQ datastore, just as you can for the MinIO and PostgreSQL datastores. The documented process is similar for all three datastores.

New Italian it-IT_Multimedia next-generation model

The service now offers a next-generation multimedia model for Italian: it-IT_Multimedia. The new model is generally available. It does not support low latency, but it does support language model customization and grammars. For more information about all available next-generation models, see Next-generation languages and models.

Updated Korean telephony and multimedia next-generation models

The existing Korean next-generation models have been updated:

  • The ko-KR_Telephony model has been updated for improved low-latency support for speech recognition.
  • The ko-KR_Multimedia model has been updated for improved speech recognition. The model now also supports low latency.

Both models are generally available, and both support language model customization and grammars. You do not need to upgrade custom language models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.

Updates to multiple next-generation telephony models

The following next-generation English language telephony models have been updated for improved speech recognition:

  • en-AU_Telephony
  • en-GB_Telephony
  • en-IN_Telephony
  • en-US_Telephony

You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.

Defect fix: Confidence scores are now reported for all transcription results

Defect fix: Confidence scores are now reported for all transcription results. Previously, when the service returned multiple transcripts for a single speech recognition request, confidence scores might not be returned for all transcripts.

Security vulnerabilities addressed

No security vulnerabilities were fixed for version 4.5.0.

25 May 2022 (Version 4.0.9)

Version 4.0.9 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.9 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

New Brazilian Portuguese pt-BR_Multimedia next-generation model

The service now offers a next-generation multimedia model for Brazilian Portuguese: pt-BR_Multimedia. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about the next-generation models and low latency, see

Update to German de-DE_Multimedia next-generation model to support low latency

The next-generation German model, de-DE_Multimedia, now supports low latency. You do not need to upgrade custom models that are based on the updated German base model. For more information about the next-generation models and low latency, see

New beta character_insertion_bias parameter for next-generation models

All next-generation models now support a new beta parameter, character_insertion_bias, which is available with all speech recognition interfaces. By default, the service is optimized for each individual model to balance its recognition of candidate strings of different lengths. The model-specific bias is equivalent to 0.0. Each model's default bias is sufficient for most speech recognition requests.

However, certain use cases might benefit from favoring hypotheses with shorter or longer strings of characters. The parameter accepts values between -1.0 and 1.0 that represent a change from a model's default. Negative values instruct the service to favor shorter strings of characters. Positive values direct the service to favor longer strings of characters. For more information, see Character insertion bias.

The Speech services do not support the OADP backup and restore utility

Watson Speech services do not support the IBM Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility. If the Speech services are installed on a cluster, you might not be able to use the IBM Cloud Pak for Data OADP backup and restore utility to back up other services that are installed on that cluster. This limitation applies to version 4.0.0 and later versions of the Speech services.

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

1 May 2022 (Version 1.2.x)

Important: End of service for Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5
Important: Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5 is out of service as of 1 May 2022. Speech to Text version 1.2.x is no longer supported, available, or documented. For more information about End of Service for Speech to Text, which is part of the Watson API Kit, see Software support discontinuance: IBM Watson API Kit for IBM Cloud Pak for Data 1.2.x.

27 April 2022 (Version 4.0.8)

Version 4.0.8 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.8 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

New environment variables used in IBM Cloud Pak for Data documentation

Most commands in the Speech to Text for IBM Cloud Pak for Data documentation have been updated to use a common set of environment variables. The documentation provides a script to automatically export the environment variables before you run installation, upgrade, and administration commands. After you source the script, you can copy most commands from the documentation and run them without making any changes.

The environment variables that the script defines include the following:

  • ${PROJECT_CPD_INSTANCE} identifies the project where you plan to install IBM Cloud Pak for Data and the Speech services.
  • ${PROJECT_CPD_OPS} identifies the project for the IBM Cloud Pak for Data platform operator.
  • ${PROJECT_CPFS_OPS} identifies the project for the IBM Cloud Pak for Data foundational services.

For more information about using the environment variables, see Best practice: Setting up install variables.

The ttsVoiceMarginalCPU property is no longer documented

The ttsVoiceMarginalCPU property has been removed from the documentation for the Speech services custom resource. The property manages the tradeoff between concurrency and speech synthesis speed. The default value of 400 ensures a reasonable balance for most customers and maintains real-time synthesis.

New German next-generation multimedia model

The service now offers a next-generation multimedia model for German: de-DE_Multimedia. The new model is generally available. It does not support low latency. It does support language model customization and grammars as generally available functionality.

For more information about all available next-generation models and their customization support, see

Beta next-generation en-WW_Medical_Telephony model now supports low latency

The beta next-generation en-WW_Medical_Telephony model now supports low latency. For more information about all next-generation models and low latency, see

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

8 April 2022 (Version 4.0.7)

Support for sounds-like is now documented for custom models based on next-generation models

For custom language models that are based on next-generation models, support is now documented for sounds-like specifications for custom words. Support for sounds-likes has been available since late 2021.

Differences exist between the use of the sounds_like field for custom models that are based on next-generation and previous-generation models. For more information about using the sounds_like field with custom models that are based on next-generation models, see Working with custom words for next-generation models.

Important: Deprecated customization_id parameter removed from the documentation

Important: On 9 October 2018, the customization_id parameter of all speech recognition requests was deprecated and replaced by the language_customization_id parameter. The customization_id parameter has now been removed from the documentation for the speech recognition methods:

  • /v1/recognize for WebSocket requests
  • POST /v1/recognize for synchronous HTTP requests (including multipart requests)
  • POST /v1/recognitions for asynchronous HTTP requests

Note: If you use the Watson SDKs, make sure that you have updated any application code to use the language_customization_id parameter instead of the customization_id parameter. The customization_id parameter will no longer be available from the equivalent methods of the SDKs as of their next major release. For more information about the speech recognition methods, see the API & SDK reference.

30 March 2022 (Version 4.0.7)

Version 4.0.7 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.7 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

Custom resource property for specifying a default model

The default voice for speech recognition requests is en-US_BroadbandModel. If you do not install the en-US_BroadbandModel, you must either

  • Use the model parameter to pass the voice that is to be used with each request.
  • Specify a new default model for your installation of Speech to Text for IBM Cloud Pak for Data by using the defaultSTTModel property in the Speech services custom resource. For more information, see Installing Watson Speech to Text and Using the default model.
Updates to English and French next-generation multimedia models to support low latency

The following multimedia models have been updated to support low latency:

  • Australian English: en-AU_Multimedia
  • UK English: en-GB_Multimedia
  • US English: en-US_Multimedia
  • French: fr-FR_Multimedia

You do not need to upgrade custom language models that are built on these base models. For more information about the next-generation models and low latency, see

New Castilian Spanish next-generation multimedia model

The service now offers a next-generation multimedia model for Castilian Spanish: es-ES_Multimedia. The new model supports low latency and is generally available. It also supports language model customization and grammars.

For more information about all available next-generation models and their customization support, see

Beta next-generation en-WW_Medical_Telephony model now supports smart formatting

The beta next-generation en-WW_Medical_Telephony model now supports the smart_formatting parameter for US English audio. For more information about all next-generation models, see Next-generation languages and models

Security vulnerabilities addressed

The following security vulnerabilities have been fixed:

17 March 2022 (Version 4.0.6)

Grammar support for next-generation models is now generally available

Grammar support is now generally available (GA) for next-general models that meet the following conditions:

  • The models are generally available.
  • The models support language model customization.

For more information, see the following topics:

15 March 2022 (Version 4.0.6)

Important: Deprecation of most previous-generation models

Superseded: This deprecation notice is superseded by the 23 February 2023 service update. The end of service date for all previous-generation models is now 31 July 2023.

Effective 15 March 2022, previous-generation models for all languages other than Arabic and Japanese are deprecated. The deprecated models remain available until 15 September 2022, when they will be removed from the service and the documentation. The Arabic and Japanese previous-generation models are not deprecated.

The following previous-generation models are now deprecated:

  • Chinese (Mandarin): zh-CN_NarrowbandModel and zh-CN_BroadbandModel
  • Dutch (Netherlands): nl-NL_NarrowbandModel and nl-NL_BroadbandModel
  • English (Australian): en-AU_NarrowbandModel and en-AU_BroadbandModel
  • English (United Kingdom): en-UK_NarrowbandModel and en-UK_BroadbandModel
  • English (United States): en-US_NarrowbandModel, en-US_BroadbandModel, and en-US_ShortForm_NarrowbandModel
  • French (Canadian): fr-CA_NarrowbandModel and fr-CA_BroadbandModel
  • French (France): fr-FR_NarrowbandModel and fr-FR_BroadbandModel
  • German: de-DE_NarrowbandModel and de-DE_BroadbandModel
  • Italian: it-IT_NarrowbandModel and it_IT_BroadbandModel
  • Korean: ko-KR_NarrowbandModel and ko-KR_BroadbandModel
  • Portuguese (Brazilian): pt-BR_NarrowbandModel and pt-BR_BroadbandModel
  • Spanish (Argentinian): es-AR_NarrowbandModel and es-AR_BroadbandModel
  • Spanish (Castilian): es-ES_NarrowbandModel and es-ES_BroadbandModel
  • Spanish (Chilean): es-CL_NarrowbandModel and es-CL_BroadbandModel
  • Spanish (Colombian): es-CO_NarrowbandModel and es-CO_BroadbandModel
  • Spanish (Mexican): es-MX_NarrowbandModel and es-MX_BroadbandModel
  • Spanish (Peruvian): es-PE_NarrowbandModel and es-PE_BroadbandModel

If you use any of these deprecated models, you must migrate to the equivalent next-generation model by the end of service date.

Note: When the previous-generation en-US_BroadbandModel is removed from service on 15 September, the next-generation en-US_Multimedia model will become the default model for speech recognition requests.

Next-generation models now support audio-parsing parameters

All next-generation models now support the following audio-parsing parameters as generally available features:

  • end_of_phrase_silence_time specifies the duration of the pause interval at which the service splits a transcript into multiple final results. For more information, see End of phrase silence time.
  • split_transcript_at_phrase_end directs the service to split the transcript into multiple final results based on semantic features of the input. For more information, see Split transcript at phrase end.
Defect fix: Correct speaker labels documentation

Defect fix: Documentation of speaker labels included the following erroneous statement in multiple places: For next-generation models, speaker labels are not supported for use with interim results or low latency. Speaker labels are supported for use with interim results and low latency for next-generation models. For more information, see Speaker labels.

23 February 2022 (Version 4.0.6)

Version 4.0.6 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.6 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

Updates to import/export scripts

The import_export.sh and transfer_ownership.sh scripts have been updated. These scripts are used to import and export data between clusters, back up and restore data, and migrate data from version 3.5 to version 4.0.x. The scripts have been modified and improved as follows:

  • The transfer_ownership.sh script now requires a -c option to be included on the command line before the <custom_resource_name> argument.
  • The transfer_ownership.sh script now requires a -v <version> option and argument to indicate the version to which ownership of resources is being transferred. Specify 35 for version 3.5 or 40 for version 4.0.x.
  • The transfer_ownership.sh script now requires a -p option to be included on the command line before the <postgres_auth_secret_name> argument.
  • The <postgres_auth_secret_name> argument provides the Kubernetes secret that is used to authenticate to the PostgreSQL datastore to which you are transferring ownership. You can omit the authentication secret if is the same as the default value (<custom-resource-name>-postgres-auth-secret for version 4.0.x, user-provided-postgressql for version 3.5). You must provide the secret if it is different from the default value.
  • Both scripts now include a -h (--help) option to display information about the script and its usage.

For more information, see

Updated recommendation for OpenShift Container Storage

Starting with Speech services version 4.0.6, the recommended storage class for OpenShift Container Storage is ocs-storagecluster-ceph-rbd.

  • If you are installing Speech services 4.0.6 or upgrading to Speech services 4.0.6 from IBM Cloud Pak for Data version 3.5, specify the ocs-storagecluster-ceph-rbd storage class during installation or upgrade.
  • If you are upgrading to Speech services 4.0.6 from a previous refresh of Cloud Pak for Data version 4.0, continue to use ocs-storagecluster-cephfs. You cannot change the storage that is used in an existing deployment.

The value is specified with the storageClass property in the Speech services custom resource:

################
# Storage class
################
  storageClass: "ocs-storagecluster-ceph-rbd"

The Speech services work with either version of OpenShift Container Storage. The newly recommended version has more restrictive access permissions. For more information, see

New beta en-WW_Medical_Telephony model is now available

A new beta next-generation en-WW_Medical_Telephony is now available. The new model understands terms from the medical and pharmacological domains. Use the model in situations where you need to transcribe common medical terminology such as medicine names, product brands, medical procedures, illnesses, types of doctor, or COVID-19-related terminology. Common use cases include conversations between a patient and a medical provider (for example, a doctor, nurse, or pharmacist).

The new model is installed from the Speech services custom resource by setting enWwMedicalTelephony to enabled: true. The model is available for all supported English dialects: Australian, Indian, UK, and US.

  • The model supports language model customization and grammars as beta functionality.
  • It supports most of the same parameters as the en-US_Telephony model.
  • It does not support the following parameters: low_latency, profanity_filter, redaction, and speaker_labels.
  • At this time, it does not support smart_formatting for IBM Cloud Pak for Data.

For more information, see The English medical telephony model.

Update to Chinese zh-CN_Telephony model

The next-generation Chinese model zh-CN_Telephony has been updated for improved speech recognition. The model continues to support low latency. By default, the service automatically uses the updated model for all speech recognition requests. For more information about all available next-generation models, see Next-generation languages and models.

If you have custom language models that are based on the updated model, you must upgrade your existing custom models to take advantage of the updates by using the POST /v1/customizations/{customization_id}/upgrade_model method. For more information, see Upgrading custom models.

Update to Japanese ja-JP_Multimedia model to support low latency

The next-generation Japanese model ja-JP_Multimedia now supports low latency. You can use the low_latency parameter with speech recognition requests that use the model. You do not need to upgrade custom models that are based on the updated Japanese base model. For more information about the next-generation models and low latency, see Next-generation languages and models and Low latency.

11 February 2022 (Version 4.0.5)

Defect fix: Improve custom model upgrade and base model version documentation

Defect fix: The documentation that describes the upgrade of custom models and the version strings that are used for different versions of base models has been updated. The documentation now states that upgrade for language model customization also applies to next-generation models. Also, the version strings that represent different versions of base models have been updated. And the base_model_version parameter can also be used with upgraded next-generation models.

For more information about custom model upgrade, when upgrade is necessary, and how to use older versions of custom models, see

Defect fix: Update capitalization documentation

Defect fix: The documentation that describes the service's automatic capitalization of transcripts has been updated. The service capitalizes appropriate nouns only for the following languages and models:

  • All previous-generation US English models
  • The next-generation German model

For more information, see Capitalization.

31 January 2022 (Version 4.0.5)

Version 4.0.5 has been updated

Speech to Text for IBM Cloud Pak for Data version 4.0.5 has been updated to address installation issues. The case package version is now 4.0.6. Use this package instead of the version 4.0.5 package. For more information about installing and managing the service, see Installing Watson Speech to Text.

Important: Extra steps for mirrored installation are no longer necessary

Important: The 26 January 2022 release notes included important notes for the following steps:

  • Additional step for performing a mirrored installation of Minio datastore
  • Additional steps for performing a mirrored installation of new next-generation models

These additional steps are no longer needed. The case package has been updated to correct the installation issues.

26 January 2022 (Version 4.0.5)

Version 4.0.5 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.5 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

Important: Additional step for performing a mirrored installation of Minio datastore

Important: These steps are no longer needed if you install case package 4.0.6. For more information, see 28 January 2022 (Version 4.0.5).

If you are performing a mirrored installation (for example, in an air-gapped environment), you need to perform an additional step before completing either of the following steps:

This step is mandatory to copy the necessary images for the Minio datastore:

echo 'cp.icr.io,cp/opencontent-minio-client,1.1.4,sha256:7b4cf5e47a0455cfa7ca9ab246b80916e4dccbc1483b3e0f276fb7b0ab3e5c60,IMAGE,linux,x86_64,"",0,CASE,"",""' \
>> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv

Failure to perform this step will cause installation errors for both Speech to Text and Text to Speech.

Important: Additional steps for performing a mirrored installation of new next-generation models

Important: These steps are no longer needed if you install case package 4.0.6. For more information, see 28 January 2022 (Version 4.0.5).

If you are performing a mirrored installation (for example, for an air-gapped environment) and plan to install any of the new next-generation models for Speech to Text (for more information, see the later release note), you must perform an additional step before completing either of the following steps:

Each additional step is unique to the model that is being installed. If you install more than one of the new models, issue the indicated command for each model that you are installing.

  • For the Chinese telephony model (zh-CN_Telephony):

    echo 'cp.icr.io,cp/watson-speech/zh-cn-telephony,2022-01-05-405models,sha256:52af6dfccd64ccd81b409936442a51a71f4ee96d980e1fc6a343a05bd4ed7fbc,IMAGE,linux,x86_64,"",0,CASE,"",""' \
    >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
    
  • For the Latin American Spanish telephony model (es-LA_Telephony):

    echo 'cp.icr.io,cp/watson-speech/es-la-telephony,2022-01-05-405models,sha256:58e8c04abe9659472e89bf0778b7dc66e0ddceb4ea18d9d3e048a08c72125ea2,IMAGE,linux,x86_64,"",0,CASE,"",""' \
    >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
    
  • For the Australian English multimedia model (en-AU_Multimedia):

    echo 'cp.icr.io,cp/watson-speech/en-au-multimedia,2022-01-05-405models,sha256:167f9a76258530a56a6abdd1c311f2ea05d6820ee0e802fbf2f96f08fb8a7646,IMAGE,linux,x86_64,"",0,CASE,"",""' \
    >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
    
  • For the UK English multimedia model (en-GB_Multimedia):

    echo 'cp.icr.io,cp/watson-speech/en-gb-multimedia,2022-01-05-405models,sha256:167f9a76258530a56a6abdd1c311f2ea05d6820ee0e802fbf2f96f08fb8a7646,IMAGE,linux,x86_64,"",0,CASE,"",""' \
    >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
    
License Server is now automatically installed

The Speech services operator now automatically installs the required License Server when it installs the Speech services. You no longer need to install the License Server from the IBM Cloud Pak for Data foundational services, and you no longer need to use additional YAML content to create an OperandRequest with the necessary bindings.

Removal of steps specific to PostgreSQL EnterpriseDB server

The previous version of the documentation included steps for the PostgreSQL EnterpriseDB server that were specific to the Speech services. These steps were documented in the topics Upgrading Watson Speech to Text (Version 4.0) and Uninstalling Watson Speech to Text. These additional steps are no longer necessary and have been removed from the documentation.

RabbitMQ datastore is now used only by the sttAsync component

The RabbitMQ datastore was previously used by components of both Speech services, Speech to Text and Text to Speech. It now handles non-persistent message queuing for the Speech to Text asynchronous HTTP component (sttAsync) only. It is used only if the sttAsync component is installed and enabled.

New next-generation models

The service now supports the following next-generation models with Speech to Text for IBM Cloud Pak for Data:

  • Chinese (Mandarin) telephony model (zh-CN_Telephony). The new model supports low latency.
  • English (Australian) multimedia model (en-AU_Multimedia). The new model does not support low latency.
  • English (UK) multimedia model (en-GB_Multimedia). The new model does not support low latency.
  • Spanish (Latin American) telephony model (es-LA_Telephony). The new model supports low latency.

Note: The Latin American Spanish model, es-LA_Telephony, applies to all Latin American dialects. It is the equivalent of the previous-generation models that are available for the Argentinian, Chilean, Colombian, Mexican, and Peruvian dialects. If you used a previous-generation model for any of these specific dialects, use the es-LA_Telephony model to migrate to the equivalent next-generation model.

The new models are generally available for speech recognition. They are generally available for language model customization and beta for grammars. They are not supported for acoustic model customization.

  • Important: If you are performing a mirrored installation (for example, in an air-gapped environment) and plan to install any of the new next-generation models for Speech to Text, you must perform additional steps before mirroring the images. For more information, see the earlier release note.
  • For more information about using the custom resource to install models, see Installing Watson Speech to Text.
  • For more information about all available next-generation models, see Next-generation languages and models.
  • For more information about customization support for next-generation models, see Customization support for next-generation models.
Next-generation US English models are now installed by default

The next-generation US English models, en-US_Multimedia and en-US_Telephony, are now installed by default with Speech to Text for IBM Cloud Pak for Data. These models join en-US_BroadbandModel, en-US_NarrowbandModel, en-US_ShortForm_NarrowbandModel as the models that are installed by default. The models now have the following entries in the Speech services custom resource:

########################################
# Speech to Text next-generation models
########################################
      enUsMultimedia:    # US English (en-US) Multimedia model
        enabled: true
      enUsTelephony:     # US English (en-US) Telephony model
        enabled: true

For more information about using the custom resource to install models, see Installing Watson Speech to Text.

Security vulnerabilities addressed

The following security vulnerabilities associated with Apache Log4j have been fixed:

20 December 2021 (Version 4.0.4)

Version 4.0.4 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.4 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

Important: Changes to properties for disabling the storage and logging of user data

Important: The names of the properties of the Speech services custom resource that specify whether user data is stored and logged have changed. The custom resource formerly contained the following properties:

#################
# Anonymize logs
#################
  sttRuntime:
    anonymizeLogs: "false"  # If true, disables storage and logging of user data
  sttAMPatcher:
    anonymizeLogs: "false"  # If true, disables storage and logging of user data
  ttsRuntime:
    anonymizeLogs: "false"  # If true, disables storage and logging of user data

These properties are now named as follows:

###################################
# Storage and logging of user data
###################################
  sttRuntime:
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data
  sttAMPatcher:
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data
  ttsRuntime:
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data

If you already set these properties in your custom resource to change the default value of false to true, you need to edit your custom resource. You must manually change the names of the properties to the new values and save the updated custom resource. For more information, see Installing Watson Speech to Text.

Important: Changes to properties of PostgreSQL secrets object

Important: When you install the Speech services, an object that contains a randomly generated password for the PostgreSQL datastore is created by default. You can choose instead to specify the password manually. If you do, the properties of the YAML file for the secrets object have changed. For more information, see the topic about managing your datastores in Administering Watson Speech to Text.

Important: PostgreSQL pods do not start with EnterpriseDB version 1.10 operator

Important: With Speech to Text for IBM Cloud Pak for Data version 4.0.3, PostgreSQL pods based on the EnterpriseDB version 1.10 operator can fail to start. This prevents the Speech services from starting. A workaround exists for this problem. If your Speech services fail to start, see PostgreSQL pods do not start with EnterpriseDB version 1.10 operator for information about diagnosing and resolving the problem.

This problem is fixed in Speech to Text for IBM Cloud Pak for Data version 4.0.4.

New support for IBM Spectrum Scale Container Native storage class

Since version 4.0.3, the Speech services support the IBM Spectrum® Scale Container Native storage class. To use IBM Spectrum Scale, specify "ibm-spectrum-scale-sc" for the storageClass property of the Speech services custom resource. For more information, see Installing Watson Speech to Text.

Interaction of Speech services with MinIO datastore during installation

The Speech services runtime components, sttRuntime and ttsRuntime, cannot start until the models and voices for the services are fully uploaded into the MinIO datastore. During installation, the services might fail and automatically restart themselves one or more times until upload of the models and voices is complete. They then start properly. No user action is required.

Defect fix: Correct upgrade documentation

Defect fix: Documentation for upgrading the Speech services to new versions of IBM Cloud Pak for Data version 4.0.x included incorrect references in some commands. These references are now correct:

  • The strings watsonSpeechToTextStatus and watsonTextToSpeechStatus have been changed to speechStatus in both cases.
  • The strings status.watsonSpeechToTextVersion and status.watsonTextToSpeechVersion have been changed to .spec.version in both cases.

For more information, see Upgrading Watson Speech to Text.

Important: Custom language models based on certain next-generation models must be re-created

Important: If you created custom language models based on certain next-generation models, you must re-create the custom models. Until you re-create the custom language models, speech recognition requests that attempt to use the custom models fail with HTTP error code 400.

You need to re-create custom language models that you created based on the following versions of next-generation models:

  • For the en-AU_Telephony model, custom models that you created from en-AU_Telephony.v2021-03-03 to en-AU_Telephony.v2021-10-04.
  • For the en-GB_Telephony model, custom models that you created from en-GB_Telephony.v2021-03-03 to en-GB_Telephony.v2021-10-04.
  • For the en-US_Telephony model, custom models that you created from en-US_Telephony.v2021-06-17 to en-US_Telephony.v2021-10-04.
  • For the en-US_Multimedia model, custom models that you created from en-US_Multimedia.v2021-03-03 to en-US_Multimedia.v2021-10-04.

To identify the version of a model on which a custom language model is based, use the GET /v1/customizations method to list all of your custom language models or the GET /v1/customizations/{customization_id} method to list a specific custom language model. The versions field of the output shows the base model for a custom language model. For more information, see Listing custom language models.

To re-create a custom language model, first create a new custom model. Then add all of the previous custom model's corpora and custom words to the new model. You can then delete the previous custom model. For more information, see Creating a custom language model.

Updates to multiple next-generation models for improved speech recognition

The following next-generation models have been updated for improved speech recognition:

  • Australian English telephony model (en-AU_Telephony)
  • UK English telephony model (en-GB_Telephony)
  • US English multimedia model (en-US_Multimedia)
  • US English telephony model (en-US_Telephony)
  • Castilian Spanish telephony model (es-ES_Telephony)

For more information about all available next-generation models, see Next-generation languages and models.

New beta grammar support for next-generation models

Grammar support is now available as beta functionality for all available next-generation models. All next-generation models are generally available (GA) and support language model customization. For more information, see the following topics:

New custom_acoustic_model field for supported features

The GET /v1/models and GET /v1/models/{model_id} methods now report whether a model supports acoustic model customization. The SupportedFeatures object now includes an additional field, custom_acoustic_model, a boolean that is true for a model that supports acoustic model customization and false otherwise. Currently, the field is true for all previous-generation models and false for all next-generation models.

Security vulnerability addressed

The following security vulnerability associated with Apache Log4j has been fixed:

20 December 2021 (Version 1.2.x)

Important: You can no longer install Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5

Important: You can no longer perform new installations of Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5. You can install only Speech to Text version 4.0.x on IBM Cloud Pak for Data version 4.x. For more information, see Installing Watson Speech to Text.

The Speech services for IBM Cloud Pak for Data version 3.5 reach their End of Support date on 30 April 2022. You are encouraged to upgrade to the latest version 4.0.x release of the services at your earliest convenience. For more information, see Upgrading Watson Speech to Text.

30 November 2021 (Version 4.0.3)

Version 4.0.3 is now available

Speech to Text for IBM Cloud Pak for Data version 4.0.3 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.

License Server now a mandatory prerequisite

You must now install the License Server from the IBM Cloud Pak for Data foundational services. You must install the License Server by using the YAML content that is provided to create an OperandRequest with the necessary bindings. You must also install the License Service in the same namespace as the service (operand), which is also where IBM Cloud Pak for Data is installed. For more information, see Installing Watson Speech to Text.

New support for in-place upgrade

The service now supports in-place, operator-based upgrade from version 4.0.0 to version 4.0.3. Moving from IBM Cloud Pak for Data version 3.5 to version 4.0.3 continues to require use of migration utilities. For more information, see Upgrading Watson Speech to Text.

EDB PostgreSQL operator and license installation changes

Installation, upgrade, and uninstallation for the Enterprise DB PostgreSQL operator and license have changed:

  • Instructions for installing the EDB PostgreSQL operator and license are now included with the IBM Cloud Pak for Data foundational services. The instructions for installing the Speech services have been updated accordingly. For more information, see Installing Watson Speech to Text.
  • Instructions for upgrading from Speech to Text version 4.0.0 to 4.0.3 include instructions for uninstalling the previous EDB PostgreSQL operator and license and reinstalling them with the IBM Cloud Pak for Data foundational services. For more information, see Upgrading Watson Speech to Text.
  • Instructions for uninstalling the Speech services now include steps for removing the EDB PostgreSQL operator and license that were previously installed with Speech to Text. For more information, see Uninstalling Watson Speech to Text.
New guidance for scaling up your installation

The service now provides updated guidance about scaling up your installation. The information includes specifying the number of pods, the number of CPUs allocated per pod, and the maximum number of concurrent sessions with previous- and next-generation models. For more information, see Administering Watson Speech to Text.

Command-line updates to import and export utilities

The commands that are used with the import and export utilities for the Speech services include new options and arguments. The import and export utilities are also the foundation for backing up and restoring the services and for migrating from IBM Cloud Pak for Data version 3.5 to version 4.0.3. For more information about using the utilities, see

New property for specifying the CPUs for acoustic model training

The sttAMPatcher microservice manages acoustic model customization for the service. The AM Patcher uses a dedicated number of CPUs to handle requests. You can use the new sttAMPatcher.resources.requestsCPU property to increase the number of CPUs that are dedicated to handling acoustic model training requests by the AM Patcher. This may be necessary if you experience training failures during acoustic model training. For more information, see Installing Watson Speech to Text.

New next-generation models

The service now supports the following new next-generation language models. All of the new models are generally available.

  • Czech: cs-CZ_Telephony. The model supports low latency.
  • Belgian Dutch (Flemish): nl-BE_Telephony. The model supports low latency.
  • French: fr-FR_Multimedia. The new model does not support low latency.
  • Indian English: en-IN_Telephony. The model supports low latency.
  • Indian Hindi: hi-IN_Telephony. The model supports low latency.
  • Japanese: ja-JP_Multimedia. The model does not support low latency.
  • Korean: ko-KR_Multimedia. The model does not support low latency.
  • Korean: ko-KR_Telephony. The model supports low latency.
  • Netherlands Dutch: nl-NL_Telephony. The model supports low latency.

For more information about all next-generation models and about low latency, see Next-generation languages and models and Low latency.

Updates to next-generation models

The following next-generation models have been updated for improved speech recognition. All of the models are generally available.

  • Arabic: ar-MS_Telephony. The model now supports low latency.
  • Brazilian Portuguese: pt-BR_Telephony. The model continues to support low latency.
  • US English: en-US_Telephony. The model continues to support low latency.
  • Canadian French: fr-CA_Telephony. The model now supports low latency.
  • Italian: it-IT_Telephony. The model now supports low latency.

For more information about all next-generation models and about low latency, see Next-generation languages and models and Low latency.

Defect fix: Address asynchronous HTTP failures

Defect fix: The asynchronous HTTP interface failed to transcribe some audio. In addition, the callback for the request returned status recognitions.completed_with_results instead of recognitions.failed. This error has been resolved.

Defect fix: Improve speakers labels results

Defect fix: When you use speakers labels with next-generation models, the service now identifies the speaker for all words of the input audio, including very short words that have the same start and end timestamps.

Defect fix: Update interim results and low-latency documentation

Defect fix: Documentation that describes the interim results and low-latency features with next-generation models has been rewritten for clarity and correctness. For more information, see the following topics:

Defect fix: Correct multitenancy documentation

Defect fix: The IBM Cloud Pak for Data topic Multitenancy support incorrectly stated that the Speech services do not support multitenancy. The topic has been updated to state that the Speech services support the following operations:

  • Install the service in separate projects
  • Install the service multiple times in the same project
  • Install the service once and deploy multiple instances in the same project

The documentation that is specific to the Speech services correctly stated the multitenancy support.

1 October 2021 (Version 1.1.x)

Version 1.1.x is out of service
Speech to Text and Text to Speech for IBM Cloud Pak for Data version 1.1.x went out of service on 30 September 2021. As of 1 October 2021, the documentation for version 1.1.x is no longer available. For more information, see Software withdrawal and support discontinuance.

31 August 2021 (Version 4.0.0)

All next-generation models are now generally available

All next-generation language models are now generally available (GA). They are supported for use in production environments and applications.

Language model customization for next-generation models is now generally available

Language model customization is now generally available (GA) for all available next-generation languages and models. Language model customization for next-generation models is supported for use in production environments and applications.

You use the same commands to create, manage, and use custom language models, corpora, and custom words for next-generation models as you do for previous-generation models. But customization for next-generation models works differently from customization for previous-generation models. For custom models that are based on next-generation models:

  • The custom models have no concept of out-of-vocabulary (OOV) words.
  • Words from corpora are not added to the words resource.
  • You cannot currently use the sounds-like feature for custom words.
  • You do not need to upgrade custom models when base language models are updated.
  • Grammars are not currently supported.

For more information about using language model customization for next-generation models, see

Additional topics describe managing custom language models, corpora, and custom words.

29 July 2021 (Version 4.0.0)

Version 4.0.0 is available

IBM Watson® Speech to Text for IBM Cloud Pak® for Data version 4.0.0 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift version 4.6. For more information about installing and managing the service, see Installing IBM Watson Speech to Text for IBM Cloud Pak for Data.

New next-generation language models

The service now supports a growing number of next-generation language models. The next-generation multimedia and telephony models improve upon the speech recognition capabilities of the service's previous generation of broadband and narrowband models. The new models leverage deep neural networks and bidirectional analysis to achieve both higher throughput and greater transcription accuracy.

At this time, the next-generation language models and the low_latency parameter are beta functionality. The next-generation models support a limited number of languages and speech recognition features. The supported languages, models, and features will increase with future releases.

Many of the next-generation models also support a new low_latency parameter that lets you request faster results at the possible expense of reduced transcription quality. When low latency is enabled, the service curtails its analysis of the audio, which can reduce the accuracy of the transcription. This trade-off might be acceptable if your application requires lower response time more than it does the highest possible accuracy.

The low_latency parameter impacts your use of the interim_results parameter with the WebSocket interface. Interim results are available only for those next-generation models that support low latency, and only if both the interim_results and low_latency parameters are set to true.

Arabic language broadband model renamed

The Arabic language broadband model is now named ar-MS_BroadbandModel. The former name, ar-AR_BroadbandModel, is deprecated. It will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the new name at your earliest convenience.

Unified Speech to Text documentation

The documentation for IBM Watson Speech to Text for IBM Cloud Pak for Data is now combined with the documentation for managed instances of the Speech to Text service that are hosted on IBM Cloud. This is true of both the guide and reference documentation for the two forms of the service. Links to the formerly separate version of the IBM Cloud Pak for Data documentation for the service redirect to the unified documentation.

For more information about identifying information that pertains to only one version of the product, see About Speech to Text.

Defect fix: Improve documentation

Defect fix: The documentation has been updated to correct the following information:

  • The documentation failed to state that next-generation models do not produce hesitation markers. The documentation has been updated to note that only previous-generation models produce hesitation markers. Next-generation models include the actual hesitations in transcription results. For more information, see Speech hesitations and hesitation markers.
  • The documentation incorrectly stated that using the smart_formatting parameter causes the service to remove hesitation markers from final transcription results for Japanese. Smart formatting does not remove hesitation markers from final results for Japanese, only for US English. For more information, see What results does smart formatting affect?
Version 1.1.x is going out of service

Speech to Text and Text to Speech for IBM Cloud Pak for Data version 1.1.x go out of service on 30 September 2021. You must upgrade to a later version of the services on IBM Cloud Pak for Data before that date. As of 1 October 2021, the documentation for version 1.1.4 will no longer be available.

12 April 2021 (Version 1.2.1)

Addition to speech-override.yaml file

The minimal speech-override.yaml file includes an extra definition, dockerRegistryPrefix:

global:
  dockerRegistryPrefix: "{Registry}"
  image:
    pullSecret: "{Registry_pull_secret}"

{Registry} is the path for the internal Docker registry. It must be image-registry.openshift-image-registry.svc:5000/{namespace}, where {namespace} is the namespace in which IBM Cloud Pak® for Data is installed, normally zen.

9 April 2021 (Version 1.2.1)

Support for modifying installed models and voices
The Speech services let you add or remove installed models and voices for version 1.2 or 1.2.1 of the services.

Version 1.2.1 (26 March 2021)

Version 1.2.1 is available

Speech to Text for IBM Cloud Pak for Data version 1.2.1 is now available. Versions 1.2 and 1.2.1 use the same version 1.2 documentation and installation instructions. Version 1.2.1 supports installation on Red Hat OpenShift version 4.6 in addition to versions 4.5 and 3.11.

New installation instructions

For both clusters connected to the internet and air-gapped clusters, the installation instructions include the following steps:

  • Use the oc label command to set up required labels for the namespace where IBM Cloud Pak for Data is installed.
  • Use the oc project command to ensure that you are pointing at the correct OpenShift project.
  • Use the cpd-cli install command to install an Enterprise DB PostgreSQL server that is used by the Speech services.

You perform these steps before you install the Speech services.

New uninstallation instructions

A step was added to the procedure for uninstalling the Speech services to clean up all of the resources from the installation.

Entitled registry for PostgreSQL datastore

The entitled registry path from which the service pulls images for the PostgreSQL datastore has changed. The registry location changed from cp.icr.io/cp/watson-speech to cp.icr.io/cp/cpd. This change is transparent to users.

Secrets for Minio and PostgreSQL datastores

The Minio and PostgreSQL datastores require the following hard-coded values for their secrets:

  • For Minio, use minio.
  • For PostgreSQL, use user-provided-postgressql.

You cannot use your own values for these secrets. The secrets must be created before you install the Speech services.

Deletions from speech-override.yaml file

The following entries have been removed from the speech-override.yaml file. They were added to work around a problem that has now been fixed.

sttRuntime:
  images:
    miniomc:
      tag:
        1.0.5
sttAMPatcher:
  images:
    miniomc:
      tag:
        1.0.5
ttsRuntime:
  images:
    miniomc:
      tag:
        1.0.5

The abbreviated speech-override.yaml file has generally been reduced further by fine-tuning its contents to the essential elements.

Version 1.2 (9 December 2020)

Version 1.2 is available

Speech to Text for IBM Cloud Pak for Data version 1.2 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data versions 3.5 and 3.0.1, and Red Hat OpenShift versions 4.5 and 3.11.

New Australian and French Canadian models

The service now offers broadband and narrowband models for Australian English and Canadian French:

  • Australian English: en-AU_BroadbandModel and en-AU_NarrowbandModel
  • Canadian French: fr-CA_BroadbandModel and fr-CA_NarrowbandModel

The new models are generally available, and they support both language model and acoustic model customization.

Updated models for improved speech recognition

The following language models have been updated for improved speech recognition:

  • Brazilian Portuguese: pt-BR_BroadbandModel and pt-BR_NarrowbandModel
  • French: fr-FR_BroadbandModel
  • German: de-DE_BroadbandModel and de-DE_NarrowbandModel
  • Japanese: ja-JP_BroadbandModel
  • UK English: en-GB_BroadbandModel and en-GB_NarrowbandModel
  • US English: en-US_ShortForm_NarrowbandModel

By default, the service automatically uses the updated models for all speech recognition requests. If you have custom language or custom acoustic models that are based on these models, you must upgrade your existing custom models to take advantage of the updates by using the following methods:

  • POST /v1/customizations/{customization_id}/upgrade_model
  • POST /v1/acoustic_customizations/{customization_id}/upgrade_model

For more information, see Upgrading custom models.

The split_transcript_at_phrase_end parameter is now generally available for all languages

The speech recognition parameter split_transcript_at_phrase_end is now generally available for all languages. Previously, it was generally available only for US and UK English. For more information, see Split transcript at phrase end.

Hesitation marker for German has changed

The hesitation marker that is used for the updated German broadband and narrowband models has changed from [hesitation] to %HESITATION. For more information about hesitation markers, see Speech hesitations and hesitation markers.

Defect fix: Address latency issue for models with large numbers of grammars

Defect fix: The service no longer has a latency issue for custom language models that contain a large number of grammars. When initially used for speech recognition, such custom models could take multiple seconds to load. The custom models now load much faster, greatly reducing latency when they are used for recognition.

15 July 2020 (Version 1.1.4)

Red Hat OpenShift version 4.3 is going out of service
IBM Cloud Pak for Data 3.0.1 is deprecating support for Red Hat OpenShift 4.3 on 1 September 2020. Red Hat OpenShift 4.3 is going out of service on 22 October 2020. IBM Cloud Pak for Data is introducing support for Red Hat OpenShift 4.5. IBM Cloud Pak for Data is recommending that clients upgrade to Red Hat OpenShift 4.5 before 22 October 2020. IBM Support will work with any customers who already installed IBM Cloud Pak for Data 3.0.1 on Red Hat OpenShift 4.3. New customers who want to install on Red Hat OpenShift 4.x are instructed to install Red Hat OpenShift 4.5.

19 June 2020 (Version 1.1.4)

Version 1.1.4 is available

Speech to Text for IBM Cloud Pak for Data version 1.1.4 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data versions 2.5 and 3.0.1, and Red Hat OpenShift versions 3.11 and 4.3. For more information about installing and managing the service, see Installing Watson Speech to Text version 1.1.4.

New parameters to control the level of speech activity detection

The service now offers two new optional parameters for controlling the level of speech activity detection. The parameters can help ensure that only relevant audio is processed for speech recognition.

  • The speech_detector_sensitivity parameter adjusts the sensitivity of speech activity detection. You can use the parameter to suppress word insertions from music, coughing, and other non-speech events.
  • The background_audio_suppression parameter suppresses background audio based on its volume to prevent it from being transcribed or otherwise interfering with speech recognition. You can use the parameter to suppress side conversations or background noise.

You can use the parameters individually or together. They are available for all interfaces and for most language models. For more information about the parameters, their allowable values, and their effect on the quality and latency of speech recognition, see Speech activity detection.

New broadband and narrowband models for Dutch and Italian

The service now supports broadband and narrowband models for the Dutch and Italian languages:

  • Dutch broadband model (nl-NL_BroadbandModel)
  • Dutch narrowband model (nl-NL_NarrowbandModel)
  • Italian broadband model (it-IT_BroadbandModel)
  • Italian narrowband model (it-IT_NarrowbandModel)

Dutch and Italian language models are generally available (GA) for speech recognition and for language model and acoustic model customization. For more information about all available language models, see

Support for speaker_labels parameter for German and Korean

The service now supports speaker labels (the speaker_labels parameter) for German and Korean language models. Speaker labels identify which individuals spoke which words in a multi-participant exchange. For more information, see Speaker labels.

Improved speech recognition for Japanese narrowband model

The Japanese narrowband model (ja-JP_NarrowbandModel) now includes some multigram word units for digits and decimal fractions. The service returns these multigram units regardless of whether you enable smart formatting. The smart formatting feature understands and returns the multigram units that the model generates. If you apply your own post-processing to transcription results, you need to handle these units appropriately. For more information, see Japanese in the smart formatting documentation.

Simplified backup and restore

The service now offers greatly improved backup and restore procedures. Utilities are now available to back up data from your datastores, so you no longer need to re-create all of your data in the event of a disaster. For more information, see Backing up and restoring your data.

1 April 2020 (Version 1.1.3)

Acoustic model customization is now generally available
Acoustic model customization is now generally available (GA) for all supported languages. For more information about support for individual language models, see Language support for customization.

28 February 2020 (Version 1.1.3)

Version 1.1.3 is available

Speech to Text for IBM Cloud Pak for Data version 1.1.3 is now available.

New end_of_phrase_silence_time parameter

For speech recognition, the service now supports the end_of_phrase_silence_time parameter. The parameter specifies the duration of the pause interval at which the service splits a transcript into multiple final results. Each final result indicates a pause or extended silence that exceeds the pause interval. For most languages, the default pause interval is 0.8 seconds; for Chinese the default interval is 0.6 seconds.

You can use the parameter to effect a trade-off between how often a final result is produced and the accuracy of the transcription. Increase the interval when accuracy is more important than latency. Decrease the interval when the speaker is expected to say short phrases or single words.

For more information, see End of phrase silence time.

New split_transcript_at_phrase_end parameter

For speech recognition, the service now supports the split_transcript_at_phrase_end parameter. The parameter directs the service to split the transcript into multiple final results based on semantic features of the input, such as at the conclusion of sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

The parameter causes the service to add an end_of_utterance field to each final result to indicate the motivation for the split: full_stop, silence, end_of_data, or reset.

For more information, see Split transcript at phrase end.

Improved speaker_labels parameter

For speech recognition, the speaker_labels parameter has been updated to improve the identification of individual speakers for further analysis of your audio sample. For more information about the speaker labels feature, see Speaker labels. For more information about the improvements to the feature, see IBM Research AI Advances Speaker Diarization in Real Use Cases.

27 November 2019 (Version 1.1.2)

Version 1.1.2 is available
Speech to Text for IBM Cloud Pak for Data version 1.1.2 is now available.
Maximum number of custom models
You can create no more than 1024 custom language models and no more than 1024 custom acoustic models per owning credentials. For more information, see Maximum number of custom models.

30 August 2019 (Version 1.0.1)

Version 1.0.1 is available

Speech to Text for IBM Cloud Pak for Data version 1.0.1 is now available. The service now works with IBM Cloud Pak for Data 2.1.0.1. The service now supports installing IBM Cloud Pak for Data with Red Hat OpenShift.

New broadband and narrowband models for Spanish dialects

The service now offers broadband and narrowband language models in six Spanish dialects:

  • Argentinian Spanish (es-AR_BroadbandModel and es-AR_NarrowbandModel)
  • Castilian Spanish (es-ES_BroadbandModel and es-ES_NarrowbandModel)
  • Chilean Spanish (es-CL_BroadbandModel and es-CL_NarrowbandModel)
  • Colombian Spanish (es-CO_BroadbandModel and es-CO_NarrowbandModel)
  • Mexican Spanish (es-MX_BroadbandModel and es-MX_NarrowbandModel)
  • Peruvian Spanish (es-PE_BroadbandModel and es-PE_NarrowbandModel)

The Castilian Spanish models are not new. They are generally available for speech recognition and language model customization, and beta for acoustic model customization.

The models for the other five dialects are new and are beta for all uses. Because they are beta, these additional dialects might not be ready for production use and are subject to change. They are initial offerings that are expected to improve in quality with time and usage.

For more information, see the following sections:

FISMA support

Federal Information Security Management Act (FISMA) support is now available for Speech to Text for IBM Cloud Pak for Data. The service is FISMA High Ready.

28 June 2019 (Version 1.0.0)

Version 1.0.0 is available

Version 1.0.0, the initial release of the service, is now available. Speech to Text for IBM Cloud Pak for Data is based on the IBM Watson® Speech to Text service on the public IBM Cloud. Speech to Text for IBM Cloud Pak for Data differs from the public Speech to Text service in the following ways. You might find this information helpful if you are already familiar with the Speech to Text service on the public IBM Cloud.

  • Speech to Text for IBM Cloud Pak for Data uses access tokens for authentication. For more information, see the API & SDK reference.
  • The endpoints for Speech to Text for IBM Cloud Pak for Data are specific to your IBM Cloud Pak for Data cluster. For more information, see the API & SDK reference.
  • Speech to Text for IBM Cloud Pak for Data does not perform any request logging. You do not need to use the X-Watson-Learning-Opt-Out request header.
  • Speech to Text for IBM Cloud Pak for Data does not support Watson tokens. You cannot use the X-Watson-Authorization-Token request header to authenticate with the service.