Release notes for Speech to Text for IBM Cloud Pak for Data
IBM Cloud Pak for Data
The following features and changes were included for each release and update of installed or on-premises instances of IBM Watson® Speech to Text for IBM Cloud Pak for Data. Unless otherwise noted, all changes are compatible with earlier releases and are automatically and transparently available to all new and existing applications.
For information about known limitations of the service, see Known limitations.
For information about releases and updates of the service for IBM Cloud, see Release notes for Speech to Text for IBM Cloud.
2 May 2023 (Version 4.6.5)
- Version 4.6.5 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.6.5 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.10 and 4.12. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- New Japanese next-generation telephony model
-
The service now offers a next-generation telephony model for Japanese:
ja-JP_Telephony
. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see - Improved language model customization for next-generation English and Japanese models
-
The service now provides improved language model customization for next-generation English and Japanese models:
en-AU_Multimedia
en-AU_Telephony
en-IN_Telephony
en-GB_Multimedia
en-GB_Telephony
en-US_Multimedia
en-US_Telephony
ja-JP_Multimedia
ja-JP_Telephony
Visible improvements to the models: The new technology improves the default behavior of the new English and Japanese models. Among other changes, the new technology optimizes the default behavior for the following parameters:
- The default
customization_weight
for custom models that are based on the new versions of these models changes from0.2
to0.1
. - The default
character_insertion_bias
for custom models that are based on the new versions of these models remains0.0
, but the models have changed in a manner that makes use of the parameter for speech recognition less necessary.
Upgrading to the new models: To take advantage of the improved technology, you must upgrade any custom language models that are based on the new models. To upgrade to the new version of one of these base models, do the following:
-
Change your custom model by adding or modifying a custom word, corpus, or grammar that the model contains. Any change that you make moves the model to the
ready
state. -
Use the
POST /v1/customizations/{customization_id}/train
method to retrain the model. Retraining upgrades the custom model to the new technology and moves the model to theavailable
state.Known issue: At this time, you cannot use the
POST /v1/customizations/{customization_id}/upgrade_model
method to upgrade a custom model to one of the new base models. This issue will be addressed in a future release.
Using the new models: Following the upgrade to the new base model, you are advised to evaluate the performance of the upgraded custom model by paying special attention to the
customization_weight
andcharacter_insertion_bias
parameters for speech recognition. When you retrain your custom model:- The custom model uses the new default
customization_weight
of0.1
for your custom model. A non-defaultcustomization_weight
that you had associated with your custom model is removed. - The custom model might no longer require use of the
character_insertion_bias
parameter for optimal speech recognition.
Improvements to language model customization render these parameters less important for high-quality speech recognition:
- If you use the default values for these parameters, continue to do so after the upgrade. The default values will likely continue to offer the best results for speech recognition.
- If you specify non-default values for these parameters, experiment with the default values following upgrade. Your custom model might work well for speech recognition with the default values.
If you feel that using different values for these parameters might improve speech recognition with your custom model, experiment with incremental changes to determine whether the parameters are needed to improve speech recognition.
Note: At this time, the improvements to language model customization apply only to custom models that are based on the next-generation English or Japanese base language models listed earlier. Over time, the improvements will be made available for other next-generation language models.
More information: For more information about upgrading and about speech recognition with these parameters, see
- New environment variable for Speech services custom resource
-
The documentation now includes instructions to create an environment variable named
${CUSTOM_RESOURCE_SPEECH}
. You append the new variable to thecpd_vars.sh
script, and source the script to use the variable in your environment. For more information, see Information you need to complete this task in Installing Watson Speech services, or refer to any of the upgrade topics for the Speech services. - Defect fix: The Swedish telephony and Italian multimedia models are now available
-
Defect fix: The Swedish telephony (
sv-SE_Telephony
) and Italian multimedia (it-IT_Multimedia
) models are now available for installation. Previously, they were not available. - Defect fix: Improved training time for next-generation custom language models
-
Defect fix: Training time for next-generation custom language models is now significantly improved. Previously, training time took much longer than necessary, as reported for training of Japanese custom language models. The problem was corrected by an internal fix.
- Defect fix: Grammar files now handle strings of digits correctly
-
Defect fix: When grammars are used, the service now handles longer strings of digits correctly. Previously, it was failing to complete recognition or returning incorrect results.
- Defect fix: Dynamically generated grammar files now work properly
-
Defect fix: Dynamically generated grammar files now work properly. Previously, dynamic grammar files could cause internal failures, as reported for integration of Speech to Text with IBM® watsonx™ Assistant. The problem was corrected by an internal fix.
- Defect fix: Smart formatting for US English dates is now correct
-
Defect fix: Smart formatting now correctly includes days of the week and dates when both are present in the spoken audio, for example,
Tuesday February 28
. Previously, in some cases the day of the week was omitted and the date was presented incorrectly. Note that smart formatting is beta functionality. - Defect fix: Update documentation for speech hesitation words for next-generation models
-
Defect fix: Documentation for speech hesitation words for next-generation models has been updated. More details are provided about US English and Japanese hesitation words. Next-generation models include the actual hesitation words in transcription results, unlike previous-generation models, which include only hesitation markers. For more information, see Speech hesitations and hesitation markers.
- Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Python (CVE-2020-10735)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to phishing attacks in Python (CVE-2021-28861)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Pypa Setuptools (CVE-2022-40897)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a sensitive information exposure in systemd (CVE-2022-4415)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Python (CVE-2022-45061)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in Libksba (CVE-2022-47629)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in GNU Tar (CVE-2022-48303)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in FasterXML jackson-databind (CVE-2022-42003)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in Perl (CVE-2020-10878)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in Apache Tomcat (CVE-2022-45143)
- CVE-2020-10543: Publication of the security bulletin is pending.
29 March 2023 (Version 4.6.4)
- Version 4.6.4 is now available
- Speech to Text for IBM Cloud Pak for Data version 4.6.4 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.10 and 4.12. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Important: Back up your data before upgrading to version 4.6.3 or 4.6.4
- Important: Before upgrading to Watson Speech services version 4.6.3 or 4.6.4, you must make a backup of your data. Preserve the backup in a safe location. For more information about backing up your Watson Speech services data, see Backing up and restoring Watson Speech services data in Administering Watson Speech services. That topic also includes information about restoring your data if that becomes necessary.
- Known issue: The Swedish telephony and Italian multimedia models are not yet available
- Known issue: The Swedish telephony (
sv-SE_Telephony
) and Italian multimedia (it-IT_Multimedia
) models are not yet available. They will be made available with version 4.6.5. - Defect fix: You can now change the installed models and voices with the advanced installation options
- Defect fix: During installation, you can now specify different models or voices with the advanced installation options of the command-line interface. Previously, the service always installed the default models and voices. The limitation continues to apply for Watson Speech services versions 4.6.0, 4.6.2, and 4.6.3. For information about installing models and voices, see Specifying additional installation options in Installing Watson Speech services.
- Setting load balancer timeouts
- Watson Speech services require that you change the load balancer timeout settings for both the server and client to 300 seconds. These settings ensure that long-running speech recognition requests, those with long or difficult audio, have sufficient time to complete. For more information, see Information you need to complete this task in Installing Watson Speech services.
- Security vulnerabilities addressed
- The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in GNOME libxml2 (CVE-2016-3709
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in SQlite (CVE-2020-35525)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in Amazon AWS S3 Crypto SDK for GoLang (CVE-2020-8912)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to elevated system privileges in the Red Hat Build of OpenJDK (CVE-2021-20264)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary code execution in e2fsprogs (CVE-2022-1304)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to errors in TrustCor (CVE-2022-23491)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GnuTLS (CVE-2022-2509)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary code execution in systemd (CVE-2022-2526)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to sensitive information exposure in AWS SDK for Go (CVE-2022-2582)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to denial of service in cURL libcurl (CVE-2022-32206)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a man-in-the-middle attack in cURL libcurl (CVE-2022-32208)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to spoofing attacks in GnuPG (CVE-2022-34903)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in SQLite (CVE-2022-35737)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in zlib (CVE-2022-37434)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in systemd (CVE-2022-3821)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary code execution in Gnome libxml2 (CVE-2022-40303)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary code execution in Gnome libxml2 (CVE-2022-40304)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Python Charmers Future (CVE-2022-40899)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in Golang Go (CVE-2022-41716)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-41717)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Freedesktop D-Bus (CVE-2022-42010)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Freedesktop D-Bus (CVE-2022-42011)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Freedesktop D-Bus (CVE-2022-42012)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in MIT krb5 (CVE-2022-42898)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in libexpat (CVE-2022-43680)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary commands execution in Python (CVE-2015-20107)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in SQlite (CVE-2020-35527)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in GNU Libtasn1 (CVE-2021-46848)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in Git (CVE-2022-23521)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in GnuPG Libksba (CVE-2022-3515)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an arbitrary code execution in libexpat (CVE-2022-40674)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in Git (CVE-2022-41903)
23 February 2023 (Version 4.6.3)
- Version 4.6.3 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.6.3 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift version 4.10. Red Hat OpenShift version 4.8 is no longer supported. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Important: All previous-generation models are deprecated and will reach end of service on 31 July 2023
-
Important: All previous-generation models are deprecated and will reach end of service effective 31 July 2023. On that date, all previous-generation models will be removed from the service and the documentation. The previous deprecation date was 3 March 2023. The new date allows users more time to migrate to the appropriate next-generation models. But users must migrate to the equivalent next-generation model by 31 July 2023.
Most previous-generation models were deprecated on 15 March 2022. Previously, the Arabic and Japanese models were not deprecated. Deprecation now applies to all previous-generation models.
- For more information about the next-generation models to which you can migrate from each of the deprecated models, see Previous-generation languages and models
- For more information about migrating from previous-generation to next-generation models, see Migrating to next-generation models.
- For more information about all next-generation models, see Next-generation languages and models
Note: When the previous-generation
en-US_BroadbandModel
is removed from service, the next-generationen-US_Multimedia
model will become the default model for speech recognition requests. - Known issue: You cannot change the installed models and voices with the advanced installation options
-
Known issue: You currently cannot specify different models or voices with the advanced installation options. The service always installs the default models and voices. For information about changing the models after installation, see Updating models and voices for your Watson Speech services in the Administration topic of Watson Speech services on IBM Cloud Pak for Data.
- Known issue: Upgrade to version 4.6.3 can fail to complete
-
Known issue: When upgrading to version 4.6.3, the MinIO backup job can fail to be deleted upon completion. If this happens, the solution is to delete the job, after which the upgrade proceeds normally. Perform the following steps to resolve the problem.
-
To determine whether the MinIO backup job remains undeleted, issue the following command:
oc get job --namespace {${PROJECT_CPD_INSTANCE} | grep speech-cr-ibm-minio-backup
The MinIO job that is not deleted is identified by an entry of the following form:
speech-cr-ibm-minio-backup 1/1 3m25s 1d
-
To delete the MinIO backup job, issue the following command:
oc delete job speech-cr-ibm-minio-backup --namespace ${PROJECT_CPD_INSTANCE}
Once the backup job is deleted, upgrade continues and completes.
-
- Defect fix: Update French Canadian next-generation telephony model (upgrade required)
-
Defect fix: The French Canadian next-generation telephony model,
fr-CA_Telephony
, was updated to address an internal inconsistency that could cause an error during speech recognition. You need to upgrade any custom models that are based on thefr-CA_Telephony
model. For more information about upgrading custom models, see - Defect fix: The next-generation Brazilian Portuguese multimedia model is now available
-
Defect fix: The next-generation Brazilian Portuguese multimedia model is now available for Speech to Text for IBM Cloud Pak for Data. Previously, the model was unavailable.
- Adding words directly to custom models that are based on next-generation models increases the training time
-
Adding custom words directly to a custom model that is based on a next-generation model causes training of a model to take a few minutes longer than it otherwise would. If you are training a model with custom words that you added by using the
POST /v1/customizations/{customization_id}/words
orPUT /v1/customizations/{customization_id}/words/{word_name}
method, allow for some minutes of extra training time for the model. For more information, see - Additional information about working with service instances
-
The documentation now includes information about creating a service instance with the command-line interface (
cpl-cli
) and about managing service instances. For more information, see the following topics of Watson Speech services on IBM Cloud Pak for Data:- Creating a Watson Speech services instance under Post-installation setup
- Managing your Watson Speech services instances under Administering
- Security vulnerability addressed
-
The following security vulnerability has been fixed:
30 January 2023 (Version 4.6.2)
- Version 4.6.2 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.6.2 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.8 and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- The custom resource now includes a new
fileStorageClass
property -
The custom resource for the Watson Speech services now includes a
fileStorageClass
property in addition to the existingblockStorageClass
property. You specify both block and file storage classes when you install or upgrade a service. During upgrade from a previous version, the new property is added automatically to the custom resource by the--file_storage_class
option oncli manage apply-cr
command.For more information about the available block and file storage classes you use with each of the supported storage solutions, see the table of Storage requirements under Information you need to complete this task on the page "Installing Watson Speech services" in Watson Speech services on IBM Cloud Pak for Data.
- Additional information about provisioning a service instance
-
The documentation now includes information about creating a service instance programmatically. It also includes examples of listing service instances and deleting a service instance. For more information, see Creating a Watson Speech services instance in the Post-installation setup documentation in Watson Speech services on IBM Cloud Pak for Data.
- Server-side encryption is enabled for the MinIO datastore
-
The Speech services have now enabled server-side encryption for object storage in the MinIO datastore. No action is required on your part.
- Change to audit webhooks
-
The Speech services have now removed the audit webhook dependency. The services now write audit events directly to the server. After upgrading to version 4.6.2, some webhook resources might remain until all services can remove the dependency. The remaining resources will be removed in a future release. No action is required on your part.
- New Netherlands Dutch next-generation multimedia model
-
The service now offers a next-generation multimedia model for Netherlands Dutch:
nl-NL_Multimedia
. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see - New Swedish next-generation telephony model
-
The service now offers a next-generation telephony model for Swedish:
sv-SE_Telephony
. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see - Updates to English next-generation telephony models
-
The English next-generation telephony models have been updated for improved speech recognition:
en-AU_Telephony
en-GB_Telephony
en-IN_Telephony
en-US_Telephony
All of these models continue to support low latency. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.
- The
max_alternatives
parameter is now available for use with next-generation models -
The
max_alternatives
parameter is now available for use with all next-generation models. The parameter is generally available for all next-generation models. For more information, see Maximum alternatives. - Defect fix: Allow use of both
max_alternatives
andend_of_phrase_silence_time
parameters with next-generation models -
Defect fix: When you use both the
max_alternatives
andend_of_phrase_silence_time
parameters in the same request with next-generation models, the service now returns multiple alternative transcripts while also respecting the indicated pause interval. Previously, use of the two parameters in a single request generated a failure. (Use of themax_alternatives
parameter with next-generation models was previously available as an experimental feature to a limited number of customers.) - Defect fix: Update to Japanese next-generation multimedia model (upgrade required)
-
Defect fix: The Japanese next-generation multimedia model,
ja-JP_Multimedia
, was updated to address an internal inconsistency that could cause an error during speech recognition with low latency. You need to upgrade any custom models that are based on theja-JP_Multimedia
model. For more information about upgrading custom models, see - Defect fix: Add documentation guidelines for creating Japanese sounds-likes based on next-generation models
-
Defect fix: In sounds-likes for Japanese custom language models that are based on next-generation models, the character-sequence
ウー
is ambiguous in some left contexts. Do not use characters (syllables) that end with the phoneme/o/
, such asロ
andト
. In such cases, useウウ
or justウ
instead ofウー
. For example, useロウウマン
orロウマン
instead ofロウーマン
. For more information, see Guidelines for Japanese. - Defect fix: Correct use of
display_as
field in transcription results -
Defect fix: For language model customization with next-generation models, the value of the
display_as
field for a custom word now appears in all transcripts. Previously, the value of theword
field sometimes appeared in transcription results. - Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to issues in OpenSSL (CVE-2022-1434, CVE-2022-1343, CVE-2022-1292, CVE-2022-1473)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary command execution in OpenSSL (CVE-2022-2068)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in protobuf (CVE-2022-1941)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a buffer overflow in GNU glibc (CVE-2021-3999)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security bypass in GNU gzip (CVE-2022-1271)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-27664)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-2879)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to query parameter smuggling in Golang Go (CVE-2022-2880)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-32189)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-41715)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to information exposure in OpenSSL (CVE-2022-2097)
30 November 2022 (Version 4.6.0)
- Version 4.6.0 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.6.0 is now available. This version supports IBM Cloud Pak for Data version 4.6.x and Red Hat OpenShift versions 4.8 and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Amazon Web Services (AWS) is now supported
-
Watson Speech services for IBM Cloud Pak for Data are now supported on Amazon Web Services™ (AWS™). The services support Amazon Elastic Block Store, which you specify by setting the
blockStorageClass
property of the Speech services custom resource togp2-csi
orgp3-csi
. - New storage classes are now supported
-
Watson Speech services for IBM Cloud Pak for Data now support two additional storage classes:
- IBM Cloud Block Storage (
ibmc-block-gold
) - NetApp Trident (
ontap-nas
)
You specify the storage class with the
blockStorageClass
property of the Speech services custom resource. For more information about all supported storage classes, see the following topics in Watson Speech services on IBM Cloud Pak for Data:- Before you begin in Installing Watson Speech services
- Specifying a storage class in Using the Watson Speech services custom resource
- IBM Cloud Block Storage (
- Known issue: Some Watson Speech services pods do not have annotations that are used for scheduling
-
Known issue: Some Watson Speech services pods are missing the
cloudpakInstanceId
annotation. If you use the IBM Cloud Pak for Data scheduling service, any Watson Speech services pods without thecloudpakInstanceId
annotation are- Scheduled by the default Kubernetes scheduler rather than the scheduling service
- Not included in the quota enforcement
- Monitoring of the PostgreSQL datastore is now available
-
You can now enable monitoring of the PostgreSQL datastore to receive updates on its usage and status by the Watson Speech services. The events can be consumed by Prometheus monitoring software or whatever application you use for monitoring. By enabling monitoring for user-defined projects in addition to the default platform monitoring, you can monitor your own projects with the Red Hat® OpenShift® Container Platform monitoring stack. This capability includes an additional property,
spec.global.datastores.postgressql.enablePodMonitor
, in the Speech services custom resource.For more information, see the topic Monitoring the PostgreSQL datastore for Watson Speech services in the Administering section of Watson Speech services on IBM Cloud Pak for Data.
- Defect fix: PostgreSQL datastore is no longer installed if only runtime microservices are enabled
-
Defect fix: The PostgreSQL datastore is no longer installed if only the runtime microservices are enabled. The datastore is now installed only if at least one of the
sttAsync
,sttCustomization
, orttsCustomization
microservices is installed. PostgreSQL is not uninstalled if at a later date these microservices are disabled.Prior to version 4.6.0, PostgreSQL was always installed with the Speech services. If you are an existing customer who used only the runtime microservices of the Speech services prior to version 4.6.0, PostgreSQL remains installed but is not used. In this case, installation of PostgreSQL persists across upgrades.
The MinIO datastore is always installed because the runtime microservices depend on it. The RabbitMQ datastore is installed only if the
sttAsync
microservice is installed.For more information, see Datastore properties in Using the Watson Speech services custom resource in Watson Speech services on IBM Cloud Pak for Data.
- Defect fix: Creation of a Network Policy is no longer necessary for the PostgreSQL operator to monitor its operands
-
Defect fix: For version 4.6.0, it is not necessary to create a Network Policy to allow the PostgreSQL operator to monitor its operands, as described in the 10 November 2022 (Versions 4.0.x and 4.5.x) service update. As of version 4.6.0, the service handles this situation automatically.
- Defect fix: Some next-generation models were updated to improve low-latency response time
-
Defect fix: The following next-generation models were updated to improve their response time when the
low_latency
parameter is used:en-IN_Telephony
hi-IN_Telephony
it-IT_Multimedia
nl-NL_Telephony
Previously, these models did not return recognition results as quickly as expected when the
low_latency
parameter was used. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models. - Defect fix: Improve custom model naming documentation
-
Defect fix: The documentation now provides detailed rules for naming custom language models and custom acoustic models. For more information, see
- Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a cross-configuration attack against OpenPGP (CVE-2021-40528)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in PCRE2 (CVE-2022-1586)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Vim (CVE-2022-1621)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a buffer overflow in Vim (CVE-2022-1629)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in Vim (CVE-2022-1785, CVE-2022-1897, CVE-2022-1927)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in cURL libcurl (CVE-2022-22576)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to credential exposure in cURL libcurl (CVE-2022-27774)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to data information exposure in cURL libcurl (CVE-2022-27776)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in cURL libcurl (CVE-2022-27782)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GNOME libxml2 (CVE-2022-29824)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a SQL injection in PostgreSQL (CVE-2022-31197)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in libexpat (CVE-2022-25313)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in libexpat (CVE-2022-25314)
10 November 2022 (Versions 4.0.x and 4.5.x)
- Known issue: Updated Network Policy needed for PostgreSQL operator
-
Known issue: For Speech services version 4.0.x (not including version 4.0.0) and 4.5.x, if the PostgreSQL operator and the Speech services are installed in different namespaces, the PostgreSQL operator is not able to monitor the PostgreSQL operands for the Speech services. The operator is prevented from monitoring the operands by the Network Policy that is in place for the Speech services.
This problem does not prevent the PostgreSQL cluster from functioning properly. The cluster remains active and fully functional. However, the operator is not able to update the operands when you upgrade to new versions of the Speech services.
The solution for the problem is to create an additional Network Policy for the PostgreSQL operator, as shown in the following steps. You can perform the steps regardless of whether the PostgreSQL operator is installed in the same namespace as the Speech services or in a different namespace.
-
Log in as an administrator of the Red Hat® OpenShift® project where the Speech services are installed.
-
Enter the following command to update the Network Policy for the Speech services:
cat << EOF | oc apply -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: labels: app.kubernetes.io/component: stt app.kubernetes.io/instance: {{ <custom-resource-name> }} app.kubernetes.io/name: speech-to-text release: {{ <custom-resource-name> }} name: <custom-resource-name>-postgres-network-policy namespace: {{ <cpd-instance-namespace> }} spec: ingress: - from: - namespaceSelector: {} podSelector: matchLabels: app.kubernetes.io/name: cloud-native-postgresql EOF
where
<custom-resource-name>
is the name of the Speech services custom resource. The recommended name for version 4.0.x isspeech-prod-cr
; the recommended name for version 4.5.x isspeech-cr
.<cpd-instance-name>
is the name of the project (namespace) in which the Speech services are installed. The documentation uses the environment variable${PROJECT_CPD_INSTANCE}
to identity the namespace.
-
To verify that the updated Network Policy allows the operator to monitor the operands and that the PostgreSQL cluster is in a healthy state, enter the following command, where
<custom-resource-name>
and<cpd-instance-name>
are the values you used in the previous step:oc -get cluster {{ <custom-resource-name> }}-postgres -n {{ <cpd-instance-namespace> }}
If the PostgreSQL cluster is functioning properly, the command produces output similar to the following:
NAME AGE INSTANCES READY STATUS PRIMARY speech-cr-postgres 14d 3 3 Cluster in healthy state speech-cr-postgres-1
These steps do not cause operator to update the operands to the latest versions. However, the operands are upgraded as expected when you next upgrade the Speech services software.
-
13 October 2022 (Version 4.5.3)
- Version 4.5.3 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.5.3 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Audit events are available for the Speech services
-
The IBM Cloud Pak for Data Audit Logging Service generates and forwards audit events for both the Speech to Text and Text to Speech services. The audit events match those that are available for Activity Tracker with the public service. For more information, see Audit events.
- You cannot uninstall individual Speech service components
-
The documentation now notes that you cannot uninstall individual service components (microservices) once they are installed. To remove any of the following components, you must uninstall the Watson Speech services in their entirety and reinstall only the components that you need: Speech to Text runtime, Speech to Text asynchronous HTTP, Speech to Text customization, Text to Speech runtime, and Text to Speech customization. For more information about installing the Speech services, see Watson Speech services on IBM Cloud Pak for Data.
- New French Canadian next-generation multimedia model
-
The service now offers a next-generation multimedia model for French Canadian:
fr-CA_Multimedia
. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about next-generation models and low latency, see - Updates to English next-generation telephony models
-
The English next-generation telephony models have been updated for improved speech recognition:
en-AU_Telephony
en-GB_Telephony
en-IN_Telephony
en-US_Telephony
All of these models continue to support low latency. You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.
- Italian next-generation multimedia model now supports low latency
-
The Italian next-generation multimedia model,
it-IT_Multimedia
, now supports low latency. For more information about next-generation models and low latency, see - Troubleshooting upgrade from version 4.0.x to version 4.5.x
-
When you upgrade the Speech services from version 4.0.x to version 4.5.x, you might encounter an issue where the PostgreSQL pods become stuck in the
Terminating
state. If this problem occurs during your upgrade, perform the following steps to resolve the problem. The information and steps are also documented in Upgrading Watson Speech services from Version 4.0 to Version 4.5 in the Upgrading topic of Watson Speech services on IBM Cloud Pak for Data.- Use the following command to identify pods that remain in the
Terminating
state:
oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | awk {'print $1'}
- Use the following command to set the environment variable
pods
to include the list of pods that remain in theTerminating
state:
pods=$(oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | awk {'print $1'})
- Use the following command to delete the stuck pods so that the upgrade process can continue:
pods=$(oc get pods -n ${PROJECT_CPD_INSTANCE} -o wide | grep Terminating | awk {'print $1'})
- Use the following command to identify pods that remain in the
- Defect fix: Fix custom resource entries documentation
-
Defect fix: The documentation for the Speech services custom resource now includes colons after the names of the models
koKrTelephony
andnlNlTelephony
. Previously, the documentation for these two entries omitted the colons. - Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a buffer over-read flaw in Linux Kernel (CVE-2020-28915)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security bypass in GNU Gzip (CVE-2022-1271)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to elevated privileges in Apple macOS Monterey and macOS Big Sur (CVE-2022-26691)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to elevated privileges in Linux Kernel (CVE-2022-27666)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in Apache Tomcat (CVE-2022-34305)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in GNU C Library (CVE-2019-19126)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GNU C Library ( CVE-2020-10029)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GNU glibc (CVE-2020-1751)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GNU glibc (CVE-2020-1752)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to information disclosure or denial of service in GNU glibc (CVE-2021-35942)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to buffer overflow in OpenSSL (CVE-2021-3711)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to information disclosure or denial of service in OpenSSL (CVE-2021-3712)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to weakened security in OpenSSL (CVE-2021-4160)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in OpenSSL (CVE-2022-0778)
19 August 2022 (Version 4.5.1)
- Important: Deprecation date for most previous-generation models is now 3 March 2023
-
Superseded: This deprecation notice is superseded by the 23 February 2023 service update. The end of service date for all previous-generation models is now 31 July 2023.
On 15 March 2022, the previous-generation models for all languages other than Arabic and Japanese were deprecated. At that time, the deprecated models were to remain available until 15 September 2022. To allow users more time to migrate to the appropriate next-generation models, the deprecated models will now remain available until 3 March 2023. As with the initial deprecation notice, the Arabic and Japanese previous-generation models are not deprecated. For complete list of all deprecated models, see the 15 March 2022 (Version 4.0.6) service update.
On 3 March 2023, the deprecated models will be removed from the service and the documentation. If you use any of the deprecated models, you must migrate to the equivalent next-generation model by the 3 March 2023.
- For more information about the next-generation models to which you can migrate from each of the deprecated models, see Previous-generation languages and models
- For more information about the next-generation models, see Next-generation languages and models
- For more information about migrating from previous-generation to next-generation models, see Migrating to next-generation models.
Note: When the previous-generation
en-US_BroadbandModel
is removed from service, the next-generationen-US_Multimedia
model will become the default model for speech recognition requests.
3 August 2022 (Version 4.5.1)
- Version 4.5.1 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.5.1 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Support for FIPS-enabled clusters
-
Both Speech to Text for IBM Cloud Pak for Data and Text to Speech for IBM Cloud Pak for Data now support running on Federal Information Processing Standard (FIPS)-enabled clusters. For more information, see Services that support FIPS.
- Defect fix: Fix ephemeral storage calculations to prevent occasional pod evictions
-
Defect fix: A defect was fixed and calculation of ephemeral storage limits is now more precise for the Speech to Text for IBM Cloud Pak for Data and Text to Speech for IBM Cloud Pak for Data runtimes. These changes prevent occasional pod evictions when the services' runtimes are under heavy load.
- Defect fix: Update speech hesitations and hesitation markers documentation
-
Defect fix: Documentation for speech hesitations and hesitation markers has been updated. Previous-generation models include hesitation markers in place of speech hesitations in transcription results for most languages; smart formatting removes hesitation markers from US English final transcripts. Next-generation models include the actual speech hesitations in transcription results; smart formatting has no effect on their inclusion in final transcription results.
For more information, see:
- Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in rsyslog (CVE-2022-24903)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an HTTP request smuggling issue in Twisted (CVE-2022-24801)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service, caused by a buffer overflow in Twisted (CVE-2022-21716)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service, caused by incomplete string comparison in NumPy (CVE-2021-34141)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service, caused by a buffer overflow in NumPy (CVE-2021-41496)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cookie and authorization header exposure in Twisted (CVE-2022-21712)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2018-18311)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2018-18312)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2018-18313)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2018-18314)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2018-6913)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to CRLF injection in Python (CVE-2019-11236)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in GNU Tar (CVE-2019-9923)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in Perl (CVE-2020-10543)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to an integer overflow in Perl (CVE-2020-10878)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a buffer overflow in Perl (CVE-2020-12723)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in urllib3 (CVE-2021-33503)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to injection attacks in Ansible (CVE-2021-3583)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-23772)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to incorrect access control in Golang Go (CVE-2022-23773)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-23806)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-24675)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-24921)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Golang Go (CVE-2022-28327)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a heap-based buffer overflow in libssh, caused by improper bounds checking (CVE-2021-3634)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in Python (CVE-2021-3737)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a possible sensitive information exposure in Python (CVE-2021-4189)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in lxml (CVE-2021-43818)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in MS Visual Studio (CVE-2021-21300)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a security restrictions bypass in Git (CVE-2021-40330)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution in MS Visual Studio (CVE-2022-24765)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary command execution in Git (CVE-2018-1000021)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in jQuery (CVE-2015-9251)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in jQuery (CVE-2019-11358)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in jQuery (CVE-2020-11022)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to cross-site scripting in jQuery (CVE-2020-11023)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a data binding rules security weakness in Spring Framework (CVE-2022-22968)
29 June 2022 (Version 4.5.0)
- Version 4.5.0 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.5.0 is now available. This version supports IBM Cloud Pak for Data version 4.5.x and Red Hat OpenShift versions 4.6, 4.8, and 4.10. For more information, see Watson Speech services on IBM Cloud Pak for Data.
- Unified Speech services for IBM Cloud Pak for Data documentation
-
The installation and administration documentation for both Speech to Text and Text to Speech is now combined in the IBM Cloud Pak for Data documentation. For more information about installing and managing the Speech services, see Watson Speech services on IBM Cloud Pak for Data.
- Changes to Speech services custom resource
-
The custom resource is now created when you initially install the Speech services. The process is described in the IBM Cloud Pak for Data installation documentation. The content of the custom resource has changed:
- The recommended name of the custom resource has changed from
speech-prod-cr
tospeech-cr
. - All references to storage class have changed from variants of
storageClass
toblockStorageClass
. - The name of the Portworx block storage class has changed from
portworx-shared-gp3
toportworx-db-gp3-sc
. - The
createSecret
property has been removed for the MinIO and PostgreSQl datastores. The property is only used internally. The Speech services always use a secrets object if you create one, and they always automatically create the object if none is provided.
- The recommended name of the custom resource has changed from
- User-provided secrets object now supported for RabbitMQ datastore
-
You can now provide security credentials for the RabbitMQ datastore, just as you can for the MinIO and PostgreSQL datastores. The documented process is similar for all three datastores.
- New Italian
it-IT_Multimedia
next-generation model -
The service now offers a next-generation multimedia model for Italian:
it-IT_Multimedia
. The new model is generally available. It does not support low latency, but it does support language model customization and grammars. For more information about all available next-generation models, see Next-generation languages and models. - Updated Korean telephony and multimedia next-generation models
-
The existing Korean next-generation models have been updated:
- The
ko-KR_Telephony
model has been updated for improved low-latency support for speech recognition. - The
ko-KR_Multimedia
model has been updated for improved speech recognition. The model now also supports low latency.
Both models are generally available, and both support language model customization and grammars. You do not need to upgrade custom language models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.
- The
- Updates to multiple next-generation telephony models
-
The following next-generation English language telephony models have been updated for improved speech recognition:
en-AU_Telephony
en-GB_Telephony
en-IN_Telephony
en-US_Telephony
You do not need to upgrade custom models that are based on these models. For more information about all available next-generation models, see Next-generation languages and models.
- Defect fix: Confidence scores are now reported for all transcription results
-
Defect fix: Confidence scores are now reported for all transcription results. Previously, when the service returned multiple transcripts for a single speech recognition request, confidence scores might not be returned for all transcripts.
- Security vulnerabilities addressed
-
No security vulnerabilities were fixed for version 4.5.0.
25 May 2022 (Version 4.0.9)
- Version 4.0.9 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.9 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- New Brazilian Portuguese
pt-BR_Multimedia
next-generation model -
The service now offers a next-generation multimedia model for Brazilian Portuguese:
pt-BR_Multimedia
. The new model supports low latency and is generally available. It also supports language model customization and grammars. For more information about the next-generation models and low latency, see - Update to German
de-DE_Multimedia
next-generation model to support low latency -
The next-generation German model,
de-DE_Multimedia
, now supports low latency. You do not need to upgrade custom models that are based on the updated German base model. For more information about the next-generation models and low latency, see - New beta
character_insertion_bias
parameter for next-generation models -
All next-generation models now support a new beta parameter,
character_insertion_bias
, which is available with all speech recognition interfaces. By default, the service is optimized for each individual model to balance its recognition of candidate strings of different lengths. The model-specific bias is equivalent to 0.0. Each model's default bias is sufficient for most speech recognition requests.However, certain use cases might benefit from favoring hypotheses with shorter or longer strings of characters. The parameter accepts values between -1.0 and 1.0 that represent a change from a model's default. Negative values instruct the service to favor shorter strings of characters. Positive values direct the service to favor longer strings of characters. For more information, see Character insertion bias.
- The Speech services do not support the OADP backup and restore utility
-
Watson Speech services do not support the IBM Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility. If the Speech services are installed on a cluster, you might not be able to use the IBM Cloud Pak for Data OADP backup and restore utility to back up other services that are installed on that cluster. This limitation applies to version 4.0.0 and later versions of the Speech services.
- Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable a denial of service, caused by a buffer overflow with Twisted (CVE-2022-21716)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service in NumPy. (CVE-2021-33430)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a denial of service, caused by improper input validation with Spring Framework (CVE-2022-22950)
1 May 2022 (Version 1.2.x)
- Important: End of service for Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5
- Important: Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5 is out of service as of 1 May 2022. Speech to Text version 1.2.x is no longer supported, available, or documented. For more information about End of Service for Speech to Text, which is part of the Watson API Kit, see Software support discontinuance: IBM Watson API Kit for IBM Cloud Pak for Data 1.2.x.
27 April 2022 (Version 4.0.8)
- Version 4.0.8 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.8 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- New environment variables used in IBM Cloud Pak for Data documentation
-
Most commands in the Speech to Text for IBM Cloud Pak for Data documentation have been updated to use a common set of environment variables. The documentation provides a script to automatically export the environment variables before you run installation, upgrade, and administration commands. After you source the script, you can copy most commands from the documentation and run them without making any changes.
The environment variables that the script defines include the following:
${PROJECT_CPD_INSTANCE}
identifies the project where you plan to install IBM Cloud Pak for Data and the Speech services.${PROJECT_CPD_OPS}
identifies the project for the IBM Cloud Pak for Data platform operator.${PROJECT_CPFS_OPS}
identifies the project for the IBM Cloud Pak for Data foundational services.
For more information about using the environment variables, see Best practice: Setting up install variables.
- The
ttsVoiceMarginalCPU
property is no longer documented -
The
ttsVoiceMarginalCPU
property has been removed from the documentation for the Speech services custom resource. The property manages the tradeoff between concurrency and speech synthesis speed. The default value of400
ensures a reasonable balance for most customers and maintains real-time synthesis. - New German next-generation multimedia model
-
The service now offers a next-generation multimedia model for German:
de-DE_Multimedia
. The new model is generally available. It does not support low latency. It does support language model customization and grammars as generally available functionality.For more information about all available next-generation models and their customization support, see
- Beta next-generation
en-WW_Medical_Telephony
model now supports low latency -
The beta next-generation
en-WW_Medical_Telephony
model now supports low latency. For more information about all next-generation models and low latency, see - Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Security Bulletin: A vulnerability with Guava affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2020-8908)
- Security Bulletin: A Google Guava vulnerability affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2018-10237)
- Security Bulletin: Vulnerabilities in Apache Tomcat affect IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2022-23181)
- Security Bulletin: A Cyrus SASL vulnerability affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2022-24407)
- Security Bulletin: A vulnerability with GNU wget affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2016-4971)
- Security Bulletin: A vulnerability with GNU Wget affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2018-0494)
- Security Bulletin: A vulnerability in 'GNU Wget' affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2018-20483)
- Security Bulletin: A vulnerability in ISC BIND affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2018-5741)
- Security Bulletin: A vulnerability in Python affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2019-20916)
- Security Bulletin: A vulnerability with ISC BIND affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-25214)
- Security Bulletin: A vulnerability in ISC BIND affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-25215)
- Security Bulletin: A vulnerability in ISC BIND affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-25216)
- Security Bulletin: A vulnerability in ISC BIND affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-25219)
- Security Bulletin: A vulnerability in PostgreSQL JDBC Driver (PgJDBC) affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2022-21724)
- Security Bulletin: A vulnerability in GNU Tar affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2019-9923)
- Security Bulletin: A vulnerability in logback-classic affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-42550)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a stack-based buffer overflow in GNU C Library (CVE-2022-23218)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to stack-based buffer overflow in GNU C Library (CVE-2022-23219)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to a buffer overflow and underflow in GNU C Library (CVE-2021-3999)
8 April 2022 (Version 4.0.7)
- Support for sounds-like is now documented for custom models based on next-generation models
-
For custom language models that are based on next-generation models, support is now documented for sounds-like specifications for custom words. Support for sounds-likes has been available since late 2021.
Differences exist between the use of the
sounds_like
field for custom models that are based on next-generation and previous-generation models. For more information about using thesounds_like
field with custom models that are based on next-generation models, see Working with custom words for next-generation models. - Important: Deprecated
customization_id
parameter removed from the documentation -
Important: On 9 October 2018, the
customization_id
parameter of all speech recognition requests was deprecated and replaced by thelanguage_customization_id
parameter. Thecustomization_id
parameter has now been removed from the documentation for the speech recognition methods:/v1/recognize
for WebSocket requestsPOST /v1/recognize
for synchronous HTTP requests (including multipart requests)POST /v1/recognitions
for asynchronous HTTP requests
Note: If you use the Watson SDKs, make sure that you have updated any application code to use the
language_customization_id
parameter instead of thecustomization_id
parameter. Thecustomization_id
parameter will no longer be available from the equivalent methods of the SDKs as of their next major release. For more information about the speech recognition methods, see the API & SDK reference.
30 March 2022 (Version 4.0.7)
- Version 4.0.7 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.7 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- Custom resource property for specifying a default model
-
The default voice for speech recognition requests is
en-US_BroadbandModel
. If you do not install theen-US_BroadbandModel
, you must either- Use the
model
parameter to pass the voice that is to be used with each request. - Specify a new default model for your installation of Speech to Text for IBM Cloud Pak for Data by using the
defaultSTTModel
property in the Speech services custom resource. For more information, see Installing Watson Speech to Text and Using the default model.
- Use the
- Updates to English and French next-generation multimedia models to support low latency
-
The following multimedia models have been updated to support low latency:
- Australian English:
en-AU_Multimedia
- UK English:
en-GB_Multimedia
- US English:
en-US_Multimedia
- French:
fr-FR_Multimedia
You do not need to upgrade custom language models that are built on these base models. For more information about the next-generation models and low latency, see
- Australian English:
- New Castilian Spanish next-generation multimedia model
-
The service now offers a next-generation multimedia model for Castilian Spanish:
es-ES_Multimedia
. The new model supports low latency and is generally available. It also supports language model customization and grammars.For more information about all available next-generation models and their customization support, see
- Beta next-generation
en-WW_Medical_Telephony
model now supports smart formatting -
The beta next-generation
en-WW_Medical_Telephony
model now supports thesmart_formatting
parameter for US English audio. For more information about all next-generation models, see Next-generation languages and models - Security vulnerabilities addressed
-
The following security vulnerabilities have been fixed:
- Red Hat CVE-2022-24407: A flaw was found in the SQL plugin shipped with Cyrus SASL. The vulnerability occurs due to failure to properly escape SQL input and leads to an improper input validation vulnerability. This flaw allows an attacker to execute arbitrary SQL commands and the ability to change the passwords for other accounts allowing escalation of privileges.
- Security Bulletin: A jwt-go vulnerability affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2020-26160)
- Security Bulletin: A vulnerability in Golang Go affects IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-29923)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is affected but not classified as vulnerable by a remote code execution in Spring Framework (CVE-2022-22965)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to arbitrary code execution with IBM WebSphere Application Server (CVE-2021-23450)
17 March 2022 (Version 4.0.6)
- Grammar support for next-generation models is now generally available
-
Grammar support is now generally available (GA) for next-general models that meet the following conditions:
- The models are generally available.
- The models support language model customization.
For more information, see the following topics:
- For more information about the status of grammar support for next-generation models, see Customization support for next-generation models.
- For more information about grammars, see Grammars.
15 March 2022 (Version 4.0.6)
- Important: Deprecation of most previous-generation models
-
Superseded: This deprecation notice is superseded by the 23 February 2023 service update. The end of service date for all previous-generation models is now 31 July 2023.
Effective 15 March 2022, previous-generation models for all languages other than Arabic and Japanese are deprecated. The deprecated models remain available until 15 September 2022, when they will be removed from the service and the documentation. The Arabic and Japanese previous-generation models are not deprecated.
The following previous-generation models are now deprecated:
- Chinese (Mandarin):
zh-CN_NarrowbandModel
andzh-CN_BroadbandModel
- Dutch (Netherlands):
nl-NL_NarrowbandModel
andnl-NL_BroadbandModel
- English (Australian):
en-AU_NarrowbandModel
anden-AU_BroadbandModel
- English (United Kingdom):
en-UK_NarrowbandModel
anden-UK_BroadbandModel
- English (United States):
en-US_NarrowbandModel
,en-US_BroadbandModel
, anden-US_ShortForm_NarrowbandModel
- French (Canadian):
fr-CA_NarrowbandModel
andfr-CA_BroadbandModel
- French (France):
fr-FR_NarrowbandModel
andfr-FR_BroadbandModel
- German:
de-DE_NarrowbandModel
andde-DE_BroadbandModel
- Italian:
it-IT_NarrowbandModel
andit_IT_BroadbandModel
- Korean:
ko-KR_NarrowbandModel
andko-KR_BroadbandModel
- Portuguese (Brazilian):
pt-BR_NarrowbandModel
andpt-BR_BroadbandModel
- Spanish (Argentinian):
es-AR_NarrowbandModel
andes-AR_BroadbandModel
- Spanish (Castilian):
es-ES_NarrowbandModel
andes-ES_BroadbandModel
- Spanish (Chilean):
es-CL_NarrowbandModel
andes-CL_BroadbandModel
- Spanish (Colombian):
es-CO_NarrowbandModel
andes-CO_BroadbandModel
- Spanish (Mexican):
es-MX_NarrowbandModel
andes-MX_BroadbandModel
- Spanish (Peruvian):
es-PE_NarrowbandModel
andes-PE_BroadbandModel
If you use any of these deprecated models, you must migrate to the equivalent next-generation model by the end of service date.
- For more information about the next-generation models to which you can migrate from each of the deprecated models, see Previous-generation languages and models
- For more information about the next-generation models, see Next-generation languages and models
- For more information about migrating from previous-generation to next-generation models, see Migrating to next-generation models.
Note: When the previous-generation
en-US_BroadbandModel
is removed from service on 15 September, the next-generationen-US_Multimedia
model will become the default model for speech recognition requests. - Chinese (Mandarin):
- Next-generation models now support audio-parsing parameters
-
All next-generation models now support the following audio-parsing parameters as generally available features:
end_of_phrase_silence_time
specifies the duration of the pause interval at which the service splits a transcript into multiple final results. For more information, see End of phrase silence time.split_transcript_at_phrase_end
directs the service to split the transcript into multiple final results based on semantic features of the input. For more information, see Split transcript at phrase end.
- Defect fix: Correct speaker labels documentation
-
Defect fix: Documentation of speaker labels included the following erroneous statement in multiple places: For next-generation models, speaker labels are not supported for use with interim results or low latency. Speaker labels are supported for use with interim results and low latency for next-generation models. For more information, see Speaker labels.
23 February 2022 (Version 4.0.6)
- Version 4.0.6 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.6 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- Updates to import/export scripts
-
The
import_export.sh
andtransfer_ownership.sh
scripts have been updated. These scripts are used to import and export data between clusters, back up and restore data, and migrate data from version 3.5 to version 4.0.x. The scripts have been modified and improved as follows:- The
transfer_ownership.sh
script now requires a-c
option to be included on the command line before the<custom_resource_name>
argument. - The
transfer_ownership.sh
script now requires a-v <version>
option and argument to indicate the version to which ownership of resources is being transferred. Specify35
for version 3.5 or40
for version 4.0.x. - The
transfer_ownership.sh
script now requires a-p
option to be included on the command line before the<postgres_auth_secret_name>
argument. - The
<postgres_auth_secret_name>
argument provides the Kubernetes secret that is used to authenticate to the PostgreSQL datastore to which you are transferring ownership. You can omit the authentication secret if is the same as the default value (<custom-resource-name>-postgres-auth-secret
for version 4.0.x,user-provided-postgressql
for version 3.5). You must provide the secret if it is different from the default value. - Both scripts now include a
-h
(--help
) option to display information about the script and its usage.
For more information, see
- Administering Watson Speech to Text, specifically Importing and exporting data and Backing up and restoring data.
- Upgrading Watson Speech to Text, specifically Migrating data from IBM Cloud Pak for Data Version 3.5.
- The
- Updated recommendation for OpenShift Container Storage
-
Starting with Speech services version 4.0.6, the recommended storage class for OpenShift Container Storage is
ocs-storagecluster-ceph-rbd
.- If you are installing Speech services 4.0.6 or upgrading to Speech services 4.0.6 from IBM Cloud Pak for Data version 3.5, specify the
ocs-storagecluster-ceph-rbd
storage class during installation or upgrade. - If you are upgrading to Speech services 4.0.6 from a previous refresh of Cloud Pak for Data version 4.0, continue to use
ocs-storagecluster-cephfs
. You cannot change the storage that is used in an existing deployment.
The value is specified with the
storageClass
property in the Speech services custom resource:################ # Storage class ################ storageClass: "ocs-storagecluster-ceph-rbd"
The Speech services work with either version of OpenShift Container Storage. The newly recommended version has more restrictive access permissions. For more information, see
- If you are installing Speech services 4.0.6 or upgrading to Speech services 4.0.6 from IBM Cloud Pak for Data version 3.5, specify the
- New beta
en-WW_Medical_Telephony
model is now available -
A new beta next-generation
en-WW_Medical_Telephony
is now available. The new model understands terms from the medical and pharmacological domains. Use the model in situations where you need to transcribe common medical terminology such as medicine names, product brands, medical procedures, illnesses, types of doctor, or COVID-19-related terminology. Common use cases include conversations between a patient and a medical provider (for example, a doctor, nurse, or pharmacist).The new model is installed from the Speech services custom resource by setting
enWwMedicalTelephony
toenabled: true
. The model is available for all supported English dialects: Australian, Indian, UK, and US.- The model supports language model customization and grammars as beta functionality.
- It supports most of the same parameters as the
en-US_Telephony
model. - It does not support the following parameters:
low_latency
,profanity_filter
,redaction
, andspeaker_labels
. - At this time, it does not support
smart_formatting
for IBM Cloud Pak for Data.
For more information, see The English medical telephony model.
- Update to Chinese
zh-CN_Telephony
model -
The next-generation Chinese model
zh-CN_Telephony
has been updated for improved speech recognition. The model continues to support low latency. By default, the service automatically uses the updated model for all speech recognition requests. For more information about all available next-generation models, see Next-generation languages and models.If you have custom language models that are based on the updated model, you must upgrade your existing custom models to take advantage of the updates by using the
POST /v1/customizations/{customization_id}/upgrade_model
method. For more information, see Upgrading custom models. - Update to Japanese
ja-JP_Multimedia
model to support low latency -
The next-generation Japanese model
ja-JP_Multimedia
now supports low latency. You can use thelow_latency
parameter with speech recognition requests that use the model. You do not need to upgrade custom models that are based on the updated Japanese base model. For more information about the next-generation models and low latency, see Next-generation languages and models and Low latency.
11 February 2022 (Version 4.0.5)
- Defect fix: Improve custom model upgrade and base model version documentation
-
Defect fix: The documentation that describes the upgrade of custom models and the version strings that are used for different versions of base models has been updated. The documentation now states that upgrade for language model customization also applies to next-generation models. Also, the version strings that represent different versions of base models have been updated. And the
base_model_version
parameter can also be used with upgraded next-generation models.For more information about custom model upgrade, when upgrade is necessary, and how to use older versions of custom models, see
- Defect fix: Update capitalization documentation
-
Defect fix: The documentation that describes the service's automatic capitalization of transcripts has been updated. The service capitalizes appropriate nouns only for the following languages and models:
- All previous-generation US English models
- The next-generation German model
For more information, see Capitalization.
31 January 2022 (Version 4.0.5)
- Version 4.0.5 has been updated
-
Speech to Text for IBM Cloud Pak for Data version 4.0.5 has been updated to address installation issues. The case package version is now 4.0.6. Use this package instead of the version 4.0.5 package. For more information about installing and managing the service, see Installing Watson Speech to Text.
- Important: Extra steps for mirrored installation are no longer necessary
-
Important: The 26 January 2022 release notes included important notes for the following steps:
- Additional step for performing a mirrored installation of Minio datastore
- Additional steps for performing a mirrored installation of new next-generation models
These additional steps are no longer needed. The case package has been updated to correct the installation issues.
26 January 2022 (Version 4.0.5)
- Version 4.0.5 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.5 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- Important: Additional step for performing a mirrored installation of Minio datastore
-
Important: These steps are no longer needed if you install case package 4.0.6. For more information, see 31 January 2022 (Version 4.0.5).
If you are performing a mirrored installation (for example, in an air-gapped environment), you need to perform an additional step before completing either of the following steps:
- Step 7 Mirroring the images to the private registry of Mirroring images with a bastion model
- Step 8 Mirroring the images to the intermediary container registry of Mirroring images with an intermediary container registry
This step is mandatory to copy the necessary images for the Minio datastore:
echo 'cp.icr.io,cp/opencontent-minio-client,1.1.4,sha256:7b4cf5e47a0455cfa7ca9ab246b80916e4dccbc1483b3e0f276fb7b0ab3e5c60,IMAGE,linux,x86_64,"",0,CASE,"",""' \ >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
Failure to perform this step will cause installation errors for both Speech to Text and Text to Speech.
- Important: Additional steps for performing a mirrored installation of new next-generation models
-
Important: These steps are no longer needed if you install case package 4.0.6. For more information, see 31 January 2022 (Version 4.0.5).
If you are performing a mirrored installation (for example, for an air-gapped environment) and plan to install any of the new next-generation models for Speech to Text (for more information, see the later release note), you must perform an additional step before completing either of the following steps:
- Step 7 Mirroring the images to the private container registry of Mirroring images with a bastion model
- Step 8 Mirroring the images to the intermediary container registry of Mirroring images with an intermediary container registry
Each additional step is unique to the model that is being installed. If you install more than one of the new models, issue the indicated command for each model that you are installing.
-
For the Chinese telephony model (
zh-CN_Telephony
):echo 'cp.icr.io,cp/watson-speech/zh-cn-telephony,2022-01-05-405models,sha256:52af6dfccd64ccd81b409936442a51a71f4ee96d980e1fc6a343a05bd4ed7fbc,IMAGE,linux,x86_64,"",0,CASE,"",""' \ >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
-
For the Latin American Spanish telephony model (
es-LA_Telephony
):echo 'cp.icr.io,cp/watson-speech/es-la-telephony,2022-01-05-405models,sha256:58e8c04abe9659472e89bf0778b7dc66e0ddceb4ea18d9d3e048a08c72125ea2,IMAGE,linux,x86_64,"",0,CASE,"",""' \ >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
-
For the Australian English multimedia model (
en-AU_Multimedia
):echo 'cp.icr.io,cp/watson-speech/en-au-multimedia,2022-01-05-405models,sha256:167f9a76258530a56a6abdd1c311f2ea05d6820ee0e802fbf2f96f08fb8a7646,IMAGE,linux,x86_64,"",0,CASE,"",""' \ >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
-
For the UK English multimedia model (
en-GB_Multimedia
):echo 'cp.icr.io,cp/watson-speech/en-gb-multimedia,2022-01-05-405models,sha256:167f9a76258530a56a6abdd1c311f2ea05d6820ee0e802fbf2f96f08fb8a7646,IMAGE,linux,x86_64,"",0,CASE,"",""' \ >> $CASE_PATH/ibm-watson-speech-4.0.5-images.csv
- License Server is now automatically installed
-
The Speech services operator now automatically installs the required License Server when it installs the Speech services. You no longer need to install the License Server from the IBM Cloud Pak for Data foundational services, and you no longer need to use additional YAML content to create an OperandRequest with the necessary bindings.
- Removal of steps specific to PostgreSQL EnterpriseDB server
-
The previous version of the documentation included steps for the PostgreSQL EnterpriseDB server that were specific to the Speech services. These steps were documented in the topics Upgrading Watson Speech to Text (Version 4.0) and Uninstalling Watson Speech to Text. These additional steps are no longer necessary and have been removed from the documentation.
- RabbitMQ datastore is now used only by the
sttAsync
component -
The RabbitMQ datastore was previously used by components of both Speech services, Speech to Text and Text to Speech. It now handles non-persistent message queuing for the Speech to Text asynchronous HTTP component (
sttAsync
) only. It is used only if thesttAsync
component is installed and enabled. - New next-generation models
-
The service now supports the following next-generation models with Speech to Text for IBM Cloud Pak for Data:
- Chinese (Mandarin) telephony model (
zh-CN_Telephony
). The new model supports low latency. - English (Australian) multimedia model (
en-AU_Multimedia
). The new model does not support low latency. - English (UK) multimedia model (
en-GB_Multimedia
). The new model does not support low latency. - Spanish (Latin American) telephony model (
es-LA_Telephony
). The new model supports low latency.
Note: The Latin American Spanish model,
es-LA_Telephony
, applies to all Latin American dialects. It is the equivalent of the previous-generation models that are available for the Argentinian, Chilean, Colombian, Mexican, and Peruvian dialects. If you used a previous-generation model for any of these specific dialects, use thees-LA_Telephony
model to migrate to the equivalent next-generation model.The new models are generally available for speech recognition. They are generally available for language model customization and beta for grammars. They are not supported for acoustic model customization.
- Important: If you are performing a mirrored installation (for example, in an air-gapped environment) and plan to install any of the new next-generation models for Speech to Text, you must perform additional steps before mirroring the images. For more information, see the earlier release note.
- For more information about using the custom resource to install models, see Installing Watson Speech to Text.
- For more information about all available next-generation models, see Next-generation languages and models.
- For more information about customization support for next-generation models, see Customization support for next-generation models.
- Chinese (Mandarin) telephony model (
- Next-generation US English models are now installed by default
-
The next-generation US English models,
en-US_Multimedia
anden-US_Telephony
, are now installed by default with Speech to Text for IBM Cloud Pak for Data. These models joinen-US_BroadbandModel
,en-US_NarrowbandModel
,en-US_ShortForm_NarrowbandModel
as the models that are installed by default. The models now have the following entries in the Speech services custom resource:######################################## # Speech to Text next-generation models ######################################## enUsMultimedia: # US English (en-US) Multimedia model enabled: true enUsTelephony: # US English (en-US) Telephony model enabled: true
For more information about using the custom resource to install models, see Installing Watson Speech to Text.
- Security vulnerabilities addressed
-
The following security vulnerabilities associated with Apache Log4j have been fixed:
- Security Bulletin: Vulnerability in Apache Log4j may affect IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data (CVE-2021-4104)
- Security Bulletin: IBM Watson Speech Services Cartridge for IBM Cloud Pak for Data is vulnerable to denial of service and arbitrary code execution due to Apache Log4j (CVE-2021-45105 and CVE-2021-45046)
20 December 2021 (Version 4.0.4)
- Version 4.0.4 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.4 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- Important: Changes to properties for disabling the storage and logging of user data
-
Important: The names of the properties of the Speech services custom resource that specify whether user data is stored and logged have changed. The custom resource formerly contained the following properties:
################# # Anonymize logs ################# sttRuntime: anonymizeLogs: "false" # If true, disables storage and logging of user data sttAMPatcher: anonymizeLogs: "false" # If true, disables storage and logging of user data ttsRuntime: anonymizeLogs: "false" # If true, disables storage and logging of user data
These properties are now named as follows:
################################### # Storage and logging of user data ################################### sttRuntime: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data sttAMPatcher: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data ttsRuntime: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data
If you already set these properties in your custom resource to change the default value of
false
totrue
, you need to edit your custom resource. You must manually change the names of the properties to the new values and save the updated custom resource. For more information, see Installing Watson Speech to Text. - Important: Changes to properties of PostgreSQL secrets object
-
Important: When you install the Speech services, an object that contains a randomly generated password for the PostgreSQL datastore is created by default. You can choose instead to specify the password manually. If you do, the properties of the YAML file for the secrets object have changed. For more information, see the topic about managing your datastores in Administering Watson Speech to Text.
- Important: PostgreSQL pods do not start with EnterpriseDB version 1.10 operator
-
Important: With Speech to Text for IBM Cloud Pak for Data version 4.0.3, PostgreSQL pods based on the EnterpriseDB version 1.10 operator can fail to start. This prevents the Speech services from starting. A workaround exists for this problem. If your Speech services fail to start, see PostgreSQL pods do not start with EnterpriseDB version 1.10 operator for information about diagnosing and resolving the problem.
This problem is fixed in Speech to Text for IBM Cloud Pak for Data version 4.0.4.
- New support for IBM Spectrum Scale Container Native storage class
-
Since version 4.0.3, the Speech services support the IBM Spectrum® Scale Container Native storage class. To use IBM Spectrum Scale, specify
"ibm-spectrum-scale-sc"
for thestorageClass
property of the Speech services custom resource. For more information, see Installing Watson Speech to Text. - Interaction of Speech services with MinIO datastore during installation
-
The Speech services runtime components,
sttRuntime
andttsRuntime
, cannot start until the models and voices for the services are fully uploaded into the MinIO datastore. During installation, the services might fail and automatically restart themselves one or more times until upload of the models and voices is complete. They then start properly. No user action is required. - Defect fix: Correct upgrade documentation
-
Defect fix: Documentation for upgrading the Speech services to new versions of IBM Cloud Pak for Data version 4.0.x included incorrect references in some commands. These references are now correct:
- The strings
watsonSpeechToTextStatus
andwatsonTextToSpeechStatus
have been changed tospeechStatus
in both cases. - The strings
status.watsonSpeechToTextVersion
andstatus.watsonTextToSpeechVersion
have been changed to.spec.version
in both cases.
For more information, see Upgrading Watson Speech to Text.
- The strings
- Important: Custom language models based on certain next-generation models must be re-created
-
Important: If you created custom language models based on certain next-generation models, you must re-create the custom models. Until you re-create the custom language models, speech recognition requests that attempt to use the custom models fail with HTTP error code 400.
You need to re-create custom language models that you created based on the following versions of next-generation models:
- For the
en-AU_Telephony
model, custom models that you created fromen-AU_Telephony.v2021-03-03
toen-AU_Telephony.v2021-10-04
. - For the
en-GB_Telephony
model, custom models that you created fromen-GB_Telephony.v2021-03-03
toen-GB_Telephony.v2021-10-04
. - For the
en-US_Telephony
model, custom models that you created fromen-US_Telephony.v2021-06-17
toen-US_Telephony.v2021-10-04
. - For the
en-US_Multimedia
model, custom models that you created fromen-US_Multimedia.v2021-03-03
toen-US_Multimedia.v2021-10-04
.
To identify the version of a model on which a custom language model is based, use the
GET /v1/customizations
method to list all of your custom language models or theGET /v1/customizations/{customization_id}
method to list a specific custom language model. Theversions
field of the output shows the base model for a custom language model. For more information, see Listing custom language models.To re-create a custom language model, first create a new custom model. Then add all of the previous custom model's corpora and custom words to the new model. You can then delete the previous custom model. For more information, see Creating a custom language model.
- For the
- Updates to multiple next-generation models for improved speech recognition
-
The following next-generation models have been updated for improved speech recognition:
- Australian English telephony model (
en-AU_Telephony
) - UK English telephony model (
en-GB_Telephony
) - US English multimedia model (
en-US_Multimedia
) - US English telephony model (
en-US_Telephony
) - Castilian Spanish telephony model (
es-ES_Telephony
)
For more information about all available next-generation models, see Next-generation languages and models.
- Australian English telephony model (
- New beta grammar support for next-generation models
-
Grammar support is now available as beta functionality for all available next-generation models. All next-generation models are generally available (GA) and support language model customization. For more information, see the following topics:
- For more information about the status of grammar support for next-generation models, see Customization support for next-generation models.
- For more information about grammars, see Grammars.
- New
custom_acoustic_model
field for supported features -
The
GET /v1/models
andGET /v1/models/{model_id}
methods now report whether a model supports acoustic model customization. TheSupportedFeatures
object now includes an additional field,custom_acoustic_model
, a boolean that istrue
for a model that supports acoustic model customization andfalse
otherwise. Currently, the field istrue
for all previous-generation models andfalse
for all next-generation models.- For more information about these methods, see Listing information about models.
- For more information about support for acoustic model customization, see Language support for customization.
- Security vulnerability addressed
-
The following security vulnerability associated with Apache Log4j has been fixed:
20 December 2021 (Version 1.2.x)
- Important: You can no longer install Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5
-
Important: You can no longer perform new installations of Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5. You can install only Speech to Text version 4.0.x on IBM Cloud Pak for Data version 4.x. For more information, see Installing Watson Speech to Text.
The Speech services for IBM Cloud Pak for Data version 3.5 reach their End of Support date on 30 April 2022. You are encouraged to upgrade to the latest version 4.0.x release of the services at your earliest convenience. For more information, see Upgrading Watson Speech to Text.
30 November 2021 (Version 4.0.3)
- Version 4.0.3 is now available
-
Speech to Text for IBM Cloud Pak for Data version 4.0.3 is now available. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift versions 4.6 and 4.8. For more information about installing and managing the service, see Installing Watson Speech to Text.
- License Server now a mandatory prerequisite
-
You must now install the License Server from the IBM Cloud Pak for Data foundational services. You must install the License Server by using the YAML content that is provided to create an OperandRequest with the necessary bindings. You must also install the License Service in the same namespace as the service (operand), which is also where IBM Cloud Pak for Data is installed. For more information, see Installing Watson Speech to Text.
- New support for in-place upgrade
-
The service now supports in-place, operator-based upgrade from version 4.0.0 to version 4.0.3. Moving from IBM Cloud Pak for Data version 3.5 to version 4.0.3 continues to require use of migration utilities. For more information, see Upgrading Watson Speech to Text.
- EDB PostgreSQL operator and license installation changes
-
Installation, upgrade, and uninstallation for the Enterprise DB PostgreSQL operator and license have changed:
- Instructions for installing the EDB PostgreSQL operator and license are now included with the IBM Cloud Pak for Data foundational services. The instructions for installing the Speech services have been updated accordingly. For more information, see Installing Watson Speech to Text.
- Instructions for upgrading from Speech to Text version 4.0.0 to 4.0.3 include instructions for uninstalling the previous EDB PostgreSQL operator and license and reinstalling them with the IBM Cloud Pak for Data foundational services. For more information, see Upgrading Watson Speech to Text.
- Instructions for uninstalling the Speech services now include steps for removing the EDB PostgreSQL operator and license that were previously installed with Speech to Text. For more information, see Uninstalling Watson Speech to Text.
- New guidance for scaling up your installation
-
The service now provides updated guidance about scaling up your installation. The information includes specifying the number of pods, the number of CPUs allocated per pod, and the maximum number of concurrent sessions with previous- and next-generation models. For more information, see Administering Watson Speech to Text.
- Command-line updates to import and export utilities
-
The commands that are used with the import and export utilities for the Speech services include new options and arguments. The import and export utilities are also the foundation for backing up and restoring the services and for migrating from IBM Cloud Pak for Data version 3.5 to version 4.0.3. For more information about using the utilities, see
- New property for specifying the CPUs for acoustic model training
-
The
sttAMPatcher
microservice manages acoustic model customization for the service. The AM Patcher uses a dedicated number of CPUs to handle requests. You can use the newsttAMPatcher.resources.requestsCPU
property to increase the number of CPUs that are dedicated to handling acoustic model training requests by the AM Patcher. This may be necessary if you experience training failures during acoustic model training. For more information, see Installing Watson Speech to Text. - New next-generation models
-
The service now supports the following new next-generation language models. All of the new models are generally available.
- Czech:
cs-CZ_Telephony
. The model supports low latency. - Belgian Dutch (Flemish):
nl-BE_Telephony
. The model supports low latency. - French:
fr-FR_Multimedia
. The new model does not support low latency. - Indian English:
en-IN_Telephony
. The model supports low latency. - Indian Hindi:
hi-IN_Telephony
. The model supports low latency. - Japanese:
ja-JP_Multimedia
. The model does not support low latency. - Korean:
ko-KR_Multimedia
. The model does not support low latency. - Korean:
ko-KR_Telephony
. The model supports low latency. - Netherlands Dutch:
nl-NL_Telephony
. The model supports low latency.
For more information about all next-generation models and about low latency, see Next-generation languages and models and Low latency.
- Czech:
- Updates to next-generation models
-
The following next-generation models have been updated for improved speech recognition. All of the models are generally available.
- Arabic:
ar-MS_Telephony
. The model now supports low latency. - Brazilian Portuguese:
pt-BR_Telephony
. The model continues to support low latency. - US English:
en-US_Telephony
. The model continues to support low latency. - Canadian French:
fr-CA_Telephony
. The model now supports low latency. - Italian:
it-IT_Telephony
. The model now supports low latency.
For more information about all next-generation models and about low latency, see Next-generation languages and models and Low latency.
- Arabic:
- Defect fix: Address asynchronous HTTP failures
-
Defect fix: The asynchronous HTTP interface failed to transcribe some audio. In addition, the callback for the request returned status
recognitions.completed_with_results
instead ofrecognitions.failed
. This error has been resolved. - Defect fix: Improve speakers labels results
-
Defect fix: When you use speakers labels with next-generation models, the service now identifies the speaker for all words of the input audio, including very short words that have the same start and end timestamps.
- Defect fix: Update interim results and low-latency documentation
-
Defect fix: Documentation that describes the interim results and low-latency features with next-generation models has been rewritten for clarity and correctness. For more information, see the following topics:
- Defect fix: Correct multitenancy documentation
-
Defect fix: The IBM Cloud Pak for Data topic Multitenancy support incorrectly stated that the Speech services do not support multitenancy. The topic has been updated to state that the Speech services support the following operations:
- Install the service in separate projects
- Install the service multiple times in the same project
- Install the service once and deploy multiple instances in the same project
The documentation that is specific to the Speech services correctly stated the multitenancy support.
1 October 2021 (Version 1.1.x)
- Version 1.1.x is out of service
- Speech to Text and Text to Speech for IBM Cloud Pak for Data version 1.1.x went out of service on 30 September 2021. As of 1 October 2021, the documentation for version 1.1.x is no longer available. For more information, see Software withdrawal and support discontinuance.
31 August 2021 (Version 4.0.0)
- All next-generation models are now generally available
-
All next-generation language models are now generally available (GA). They are supported for use in production environments and applications.
- For more information about all next-generation language models and which models are currently available for IBM Cloud Pak for Data, see Next-generation languages and models.
- For more information about the features that are supported for each next-generation model, see Supported features for next-generation models.
- Language model customization for next-generation models is now generally available
-
Language model customization is now generally available (GA) for all available next-generation languages and models. Language model customization for next-generation models is supported for use in production environments and applications.
You use the same commands to create, manage, and use custom language models, corpora, and custom words for next-generation models as you do for previous-generation models. But customization for next-generation models works differently from customization for previous-generation models. For custom models that are based on next-generation models:
- The custom models have no concept of out-of-vocabulary (OOV) words.
- Words from corpora are not added to the words resource.
- You cannot currently use the sounds-like feature for custom words.
- You do not need to upgrade custom models when base language models are updated.
- Grammars are not currently supported.
For more information about using language model customization for next-generation models, see
- Understanding customization
- Language support for customization
- Creating a custom language model
- Using a custom language model for speech recognition
- Working with corpora and custom words for next-generation models
Additional topics describe managing custom language models, corpora, and custom words.
29 July 2021 (Version 4.0.0)
- Version 4.0.0 is available
-
IBM Watson® Speech to Text for IBM Cloud Pak® for Data version 4.0.0 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data version 4.x and Red Hat OpenShift version 4.6. For more information about installing and managing the service, see Installing IBM Watson Speech to Text for IBM Cloud Pak for Data.
- New next-generation language models
-
The service now supports a growing number of next-generation language models. The next-generation multimedia and telephony models improve upon the speech recognition capabilities of the service's previous generation of broadband and narrowband models. The new models leverage deep neural networks and bidirectional analysis to achieve both higher throughput and greater transcription accuracy.
At this time, the next-generation language models and the
low_latency
parameter are beta functionality. The next-generation models support a limited number of languages and speech recognition features. The supported languages, models, and features will increase with future releases.Many of the next-generation models also support a new
low_latency
parameter that lets you request faster results at the possible expense of reduced transcription quality. When low latency is enabled, the service curtails its analysis of the audio, which can reduce the accuracy of the transcription. This trade-off might be acceptable if your application requires lower response time more than it does the highest possible accuracy.The
low_latency
parameter impacts your use of theinterim_results
parameter with the WebSocket interface. Interim results are available only for those next-generation models that support low latency, and only if both theinterim_results
andlow_latency
parameters are set totrue
.- For more information about the next-generation models and their capabilities, see Next-generation languages and models.
- For more information about language support for next-generation models and about which next-generation models support low latency, see Supported next-generation language models.
- For more information about feature support for next-generation models, see Supported features for next-generation models.
- For more information about the
low_latency
parameter, see Low latency. - For more information about the interaction between the
low_latency
andinterim_results
parameters for next-generation models, see Requesting interim results and low latency.
- Arabic language broadband model renamed
-
The Arabic language broadband model is now named
ar-MS_BroadbandModel
. The former name,ar-AR_BroadbandModel
, is deprecated. It will continue to function for at least one year but might be removed at a future date. You are encouraged to migrate to the new name at your earliest convenience. - Unified Speech to Text documentation
-
The documentation for IBM Watson Speech to Text for IBM Cloud Pak for Data is now combined with the documentation for managed instances of the Speech to Text service that are hosted on IBM Cloud. This is true of both the guide and reference documentation for the two forms of the service. Links to the formerly separate version of the IBM Cloud Pak for Data documentation for the service redirect to the unified documentation.
For more information about identifying information that pertains to only one version of the product, see About Speech to Text.
- Defect fix: Improve documentation
-
Defect fix: The documentation has been updated to correct the following information:
- The documentation failed to state that next-generation models do not produce hesitation markers. The documentation has been updated to note that only previous-generation models produce hesitation markers. Next-generation models include the actual hesitations in transcription results. For more information, see Speech hesitations and hesitation markers.
- The documentation incorrectly stated that using the
smart_formatting
parameter causes the service to remove hesitation markers from final transcription results for Japanese. Smart formatting does not remove hesitation markers from final results for Japanese, only for US English. For more information, see What results does smart formatting affect?
- Version 1.1.x is going out of service
-
Speech to Text and Text to Speech for IBM Cloud Pak for Data version 1.1.x go out of service on 30 September 2021. You must upgrade to a later version of the services on IBM Cloud Pak for Data before that date. As of 1 October 2021, the documentation for version 1.1.4 will no longer be available.
12 April 2021 (Version 1.2.1)
- Addition to
speech-override.yaml
file -
The minimal
speech-override.yaml
file includes an extra definition,dockerRegistryPrefix
:global: dockerRegistryPrefix: "{Registry}" image: pullSecret: "{Registry_pull_secret}"
{Registry}
is the path for the internal Docker registry. It must beimage-registry.openshift-image-registry.svc:5000/{namespace}
, where{namespace}
is the namespace in which IBM Cloud Pak® for Data is installed, normallyzen
.
9 April 2021 (Version 1.2.1)
- Support for modifying installed models and voices
- The Speech services let you add or remove installed models and voices for version 1.2 or 1.2.1 of the services.
Version 1.2.1 (26 March 2021)
- Version 1.2.1 is available
-
Speech to Text for IBM Cloud Pak for Data version 1.2.1 is now available. Versions 1.2 and 1.2.1 use the same version 1.2 documentation and installation instructions. Version 1.2.1 supports installation on Red Hat OpenShift version 4.6 in addition to versions 4.5 and 3.11.
- New installation instructions
-
For both clusters connected to the internet and air-gapped clusters, the installation instructions include the following steps:
- Use the
oc label
command to set up required labels for the namespace where IBM Cloud Pak for Data is installed. - Use the
oc project
command to ensure that you are pointing at the correct OpenShift project. - Use the
cpd-cli install
command to install an Enterprise DB PostgreSQL server that is used by the Speech services.
You perform these steps before you install the Speech services.
- Use the
- New uninstallation instructions
-
A step was added to the procedure for uninstalling the Speech services to clean up all of the resources from the installation.
- Entitled registry for PostgreSQL datastore
-
The entitled registry path from which the service pulls images for the PostgreSQL datastore has changed. The registry location changed from
cp.icr.io/cp/watson-speech
tocp.icr.io/cp/cpd
. This change is transparent to users. - Secrets for Minio and PostgreSQL datastores
-
The Minio and PostgreSQL datastores require the following hard-coded values for their secrets:
- For Minio, use
minio
. - For PostgreSQL, use
user-provided-postgressql
.
You cannot use your own values for these secrets. The secrets must be created before you install the Speech services.
- For Minio, use
- Deletions from
speech-override.yaml
file -
The following entries have been removed from the
speech-override.yaml
file. They were added to work around a problem that has now been fixed.sttRuntime: images: miniomc: tag: 1.0.5 sttAMPatcher: images: miniomc: tag: 1.0.5 ttsRuntime: images: miniomc: tag: 1.0.5
The abbreviated
speech-override.yaml
file has generally been reduced further by fine-tuning its contents to the essential elements.
Version 1.2 (9 December 2020)
- Version 1.2 is available
-
Speech to Text for IBM Cloud Pak for Data version 1.2 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data versions 3.5 and 3.0.1, and Red Hat OpenShift versions 4.5 and 3.11.
- New Australian and French Canadian models
-
The service now offers broadband and narrowband models for Australian English and Canadian French:
- Australian English:
en-AU_BroadbandModel
anden-AU_NarrowbandModel
- Canadian French:
fr-CA_BroadbandModel
andfr-CA_NarrowbandModel
The new models are generally available, and they support both language model and acoustic model customization.
- For more information about supported languages and models, see Previous-generation languages and models.
- For more information about language support for customization, see Language support for customization.
- Australian English:
- Updated models for improved speech recognition
-
The following language models have been updated for improved speech recognition:
- Brazilian Portuguese:
pt-BR_BroadbandModel
andpt-BR_NarrowbandModel
- French:
fr-FR_BroadbandModel
- German:
de-DE_BroadbandModel
andde-DE_NarrowbandModel
- Japanese:
ja-JP_BroadbandModel
- UK English:
en-GB_BroadbandModel
anden-GB_NarrowbandModel
- US English:
en-US_ShortForm_NarrowbandModel
By default, the service automatically uses the updated models for all speech recognition requests. If you have custom language or custom acoustic models that are based on these models, you must upgrade your existing custom models to take advantage of the updates by using the following methods:
POST /v1/customizations/{customization_id}/upgrade_model
POST /v1/acoustic_customizations/{customization_id}/upgrade_model
For more information, see Upgrading custom models.
- Brazilian Portuguese:
- The
split_transcript_at_phrase_end
parameter is now generally available for all languages -
The speech recognition parameter
split_transcript_at_phrase_end
is now generally available for all languages. Previously, it was generally available only for US and UK English. For more information, see Split transcript at phrase end. - Hesitation marker for German has changed
-
The hesitation marker that is used for the updated German broadband and narrowband models has changed from
[hesitation]
to%HESITATION
. For more information about hesitation markers, see Speech hesitations and hesitation markers. - Defect fix: Address latency issue for models with large numbers of grammars
-
Defect fix: The service no longer has a latency issue for custom language models that contain a large number of grammars. When initially used for speech recognition, such custom models could take multiple seconds to load. The custom models now load much faster, greatly reducing latency when they are used for recognition.
15 July 2020 (Version 1.1.4)
- Red Hat OpenShift version 4.3 is going out of service
- IBM Cloud Pak for Data 3.0.1 is deprecating support for Red Hat OpenShift 4.3 on 1 September 2020. Red Hat OpenShift 4.3 is going out of service on 22 October 2020. IBM Cloud Pak for Data is introducing support for Red Hat OpenShift 4.5. IBM Cloud Pak for Data is recommending that clients upgrade to Red Hat OpenShift 4.5 before 22 October 2020. IBM Support will work with any customers who already installed IBM Cloud Pak for Data 3.0.1 on Red Hat OpenShift 4.3. New customers who want to install on Red Hat OpenShift 4.x are instructed to install Red Hat OpenShift 4.5.
19 June 2020 (Version 1.1.4)
- Version 1.1.4 is available
-
Speech to Text for IBM Cloud Pak for Data version 1.1.4 is now available. Installation and administration of the service include many changes. This version supports IBM Cloud Pak for Data versions 2.5 and 3.0.1, and Red Hat OpenShift versions 3.11 and 4.3. For more information about installing and managing the service, see Installing and managing Speech to Text for IBM Cloud Pak for Data.
- New parameters to control the level of speech activity detection
-
The service now offers two new optional parameters for controlling the level of speech activity detection. The parameters can help ensure that only relevant audio is processed for speech recognition.
- The
speech_detector_sensitivity
parameter adjusts the sensitivity of speech activity detection. You can use the parameter to suppress word insertions from music, coughing, and other non-speech events. - The
background_audio_suppression
parameter suppresses background audio based on its volume to prevent it from being transcribed or otherwise interfering with speech recognition. You can use the parameter to suppress side conversations or background noise.
You can use the parameters individually or together. They are available for all interfaces and for most language models. For more information about the parameters, their allowable values, and their effect on the quality and latency of speech recognition, see Speech activity detection.
- The
- New broadband and narrowband models for Dutch and Italian
-
The service now supports broadband and narrowband models for the Dutch and Italian languages:
- Dutch broadband model (
nl-NL_BroadbandModel
) - Dutch narrowband model (
nl-NL_NarrowbandModel
) - Italian broadband model (
it-IT_BroadbandModel
) - Italian narrowband model (
it-IT_NarrowbandModel
)
Dutch and Italian language models are generally available (GA) for speech recognition and for language model and acoustic model customization. For more information about all available language models, see
- Dutch broadband model (
- Support for
speaker_labels
parameter for German and Korean -
The service now supports speaker labels (the
speaker_labels
parameter) for German and Korean language models. Speaker labels identify which individuals spoke which words in a multi-participant exchange. For more information, see Speaker labels. - Improved speech recognition for Japanese narrowband model
-
The Japanese narrowband model (
ja-JP_NarrowbandModel
) now includes some multigram word units for digits and decimal fractions. The service returns these multigram units regardless of whether you enable smart formatting. The smart formatting feature understands and returns the multigram units that the model generates. If you apply your own post-processing to transcription results, you need to handle these units appropriately. For more information, see Japanese in the smart formatting documentation. - Simplified backup and restore
-
The service now offers greatly improved backup and restore procedures. Utilities are now available to back up data from your datastores, so you no longer need to re-create all of your data in the event of a disaster. For more information, Backing up and restoring Watson Speech services data.
1 April 2020 (Version 1.1.3)
- Acoustic model customization is now generally available
- Acoustic model customization is now generally available (GA) for all supported languages. For more information about support for individual language models, see Language support for customization.
28 February 2020 (Version 1.1.3)
- Version 1.1.3 is available
-
Speech to Text for IBM Cloud Pak for Data version 1.1.3 is now available.
- New
end_of_phrase_silence_time
parameter -
For speech recognition, the service now supports the
end_of_phrase_silence_time
parameter. The parameter specifies the duration of the pause interval at which the service splits a transcript into multiple final results. Each final result indicates a pause or extended silence that exceeds the pause interval. For most languages, the default pause interval is 0.8 seconds; for Chinese the default interval is 0.6 seconds.You can use the parameter to effect a trade-off between how often a final result is produced and the accuracy of the transcription. Increase the interval when accuracy is more important than latency. Decrease the interval when the speaker is expected to say short phrases or single words.
For more information, see End of phrase silence time.
- New
split_transcript_at_phrase_end
parameter -
For speech recognition, the service now supports the
split_transcript_at_phrase_end
parameter. The parameter directs the service to split the transcript into multiple final results based on semantic features of the input, such as at the conclusion of sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.The parameter causes the service to add an
end_of_utterance
field to each final result to indicate the motivation for the split:full_stop
,silence
,end_of_data
, orreset
.For more information, see Split transcript at phrase end.
- Improved
speaker_labels
parameter -
For speech recognition, the
speaker_labels
parameter has been updated to improve the identification of individual speakers for further analysis of your audio sample. For more information about the speaker labels feature, see Speaker labels. For more information about the improvements to the feature, see IBM Research AI Advances Speaker Diarization in Real Use Cases.
27 November 2019 (Version 1.1.2)
- Version 1.1.2 is available
- Speech to Text for IBM Cloud Pak for Data version 1.1.2 is now available.
- Maximum number of custom models
- You can create no more than 1024 custom language models and no more than 1024 custom acoustic models per owning credentials. For more information, see Maximum number of custom models.
30 August 2019 (Version 1.0.1)
- Version 1.0.1 is available
-
Speech to Text for IBM Cloud Pak for Data version 1.0.1 is now available. The service now works with IBM Cloud Pak for Data 2.1.0.1. The service now supports installing IBM Cloud Pak for Data with Red Hat OpenShift.
- New broadband and narrowband models for Spanish dialects
-
The service now offers broadband and narrowband language models in six Spanish dialects:
- Argentinian Spanish (
es-AR_BroadbandModel
andes-AR_NarrowbandModel
) - Castilian Spanish (
es-ES_BroadbandModel
andes-ES_NarrowbandModel
) - Chilean Spanish (
es-CL_BroadbandModel
andes-CL_NarrowbandModel
) - Colombian Spanish (
es-CO_BroadbandModel
andes-CO_NarrowbandModel
) - Mexican Spanish (
es-MX_BroadbandModel
andes-MX_NarrowbandModel
) - Peruvian Spanish (
es-PE_BroadbandModel
andes-PE_NarrowbandModel
)
The Castilian Spanish models are not new. They are generally available for speech recognition and language model customization, and beta for acoustic model customization.
The models for the other five dialects are new and are beta for all uses. Because they are beta, these additional dialects might not be ready for production use and are subject to change. They are initial offerings that are expected to improve in quality with time and usage.
For more information, see the following sections:
- Argentinian Spanish (
- FISMA support
-
Federal Information Security Management Act (FISMA) support is now available for Speech to Text for IBM Cloud Pak for Data. The service is FISMA High Ready.
28 June 2019 (Version 1.0.0)
- Version 1.0.0 is available
-
Version 1.0.0, the initial release of the service, is now available. Speech to Text for IBM Cloud Pak for Data is based on the IBM Watson® Speech to Text service on the public IBM Cloud. Speech to Text for IBM Cloud Pak for Data differs from the public Speech to Text service in the following ways. You might find this information helpful if you are already familiar with the Speech to Text service on the public IBM Cloud.
- Speech to Text for IBM Cloud Pak for Data uses access tokens for authentication. For more information, see the API & SDK reference.
- The endpoints for Speech to Text for IBM Cloud Pak for Data are specific to your IBM Cloud Pak for Data cluster. For more information, see the API & SDK reference.
- Speech to Text for IBM Cloud Pak for Data does not perform any request logging. You do not need to use the
X-Watson-Learning-Opt-Out
request header. - Speech to Text for IBM Cloud Pak for Data does not support Watson tokens. You cannot use the
X-Watson-Authorization-Token
request header to authenticate with the service.