Migrating to next-generation models
Starting August 1, 2023, all previous-generation models are now discontinued from the service. New clients must now only use the next-generation models. All existing clients must now migrate to the equivalent next-generation model. For more information, please consult the information on this page.
You must migrate your use of any deprecated previous-generation models to the equivalent next-generation models by the 31 July 2023 end of service date. Next-generation models provide appreciably better transcription accuracy and throughput. But they currently provide slightly fewer features than previous-generation models.
This topic provides an overview of the steps that you need to take to migrate from previous- to next-generation models. For more information about migrating, you can also see Watson Speech to Text: How to Plan Your Migration to the Next-Generation Models.
Step 1: Identify the next-generation model to which to migrate
The following topics describe all previous- and next-generation models:
The tables in Supported previous-generation language models list the recommended next-generation model to which to migrate from a previous generation model. Use the indicated next-generation model in your speech recognition requests.
The service continues to make new next-generation models available. All new models are identified in the release notes and in the tables that describe the available models.
Optimally, you migrate from a narrowband model to a next-generation telephony model, and from a broadband model to a next-generation multimedia model. However, not all broadband models have equivalent multimedia models. In such cases, you can migrate from a broadband model to a telephony model. The service downsamples the audio that you send to the rate of the model that you use. So sending broadband audio to a telephony model might prove a sufficient alternative in cases where no equivalent multimedia model is currently available.
For example, the following speech recognition request uses the previous-generation en-US_NarrowbandModel
:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?model=en-US_NarrowbandModel"
To use the equivalent next-generation en-US_Telephony
model, you simply change the value that you pass with the model
query parameter:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?model=en-US_Telephony"
Step 2: Identify the features that are available with next-generation models
Next-generation models support slightly fewer features and parameters than previous-generation models. However, although they lack full parity, most features are available with both types of models. And where a feature is limited to a subset of languages, the limitations apply equally to both types of models.
For information about the features that are supported with the different model types, see
- Supported features for previous-generation models
- Supported features for next-generation models
- Parameter summary
The service continues to make new features available with next-generation models. All updates to feature support are documented in the release notes and in the documentation for the model types.
To migrate to next-generation models, you must remove features that aren't supported by next-generation models from your speech recognition requests. You can also consider using features such as low latency and character insertion bias that are available only with next-generation models.
For example, the following speech recognition request uses the profanity_filter
, redaction
, and word_alternatives_threshold
parameters with the previous-generation en-US_NarrowbandModel
:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize?model=en-US_NarrowbandModel&profanity_filter=true&radaction=true&word_alternatives_threshold=0.50"
Only the word_alternatives_threshold
parameter is not supported by next-generation models. To use the equivalent next-generation en-US_Telephony
model, you simply change the value that you pass with the model
query parameter and eliminate the word_alternatives_threshold
parameter:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize?model=en-US_Telephony&profanity_filter=true&radaction=true"
Step 3: Re-create any custom language models that you use
You must re-create any custom language models that are based on previous-generation models by basing them on the equivalent next-generation models. This requires that you create a new custom language model and add your corpora, grammars, and custom words from the old model to the new model.
In general, next-generation models do not rely as heavily on custom language models. They use a different approach to transcription that minimizes the need for language model customization.
Next-generation models do not support custom acoustic models. Because of how the models transcribe audio, acoustic model customization is not necessary.
For example, the following speech recognition request uses a custom language model that is based on the en-US_NarrowbandModel
. In this example, the custom model has the identifier 8acf31fa-0aa2-4ecc-a805-1f527f342dba
.
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @audio-file.flac \
"{url}/v1/recognize?model=en-US_NarrowbandModel&language_customization_id=8acf31fa-0aa2-4ecc-a805-1f527f342dba"
After you re-create the custom language model with the equivalent en-US_Telephony
model, simply update the model name to en-US_telephony
and the language_customization_ID
parameter to use the identifier
of the new custom model, 636d8494-7e53-436a-8557-30d6b2a63cd7
:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @audio-file.flac \
"{url}/v1/recognize?model=en-US_Telephony&language_customization_id=636d8494-7e53-436a-8557-30d6b2a63cd7"
Step 4: Evaluate the results of the next-generation model
Once you have updated your speech recognition requests to use next-generation models, eliminated unsupported parameters, and re-created any custom language models, you can experiment with speech recognition based on previous- and next-generation models. Compare the resulting transcripts to determine whether the next-generation model produces equivalent or better results. Also consider the performance of requests that use next-generation to determine how much faster you receive results.
You can also compare the word error rate of the previous- and next-generation results. The open-source Word Error Rate (WER) utility, which is available in Python, can help you measure and compare the accuracy of your results.