IBM Cloud Docs
Previous-generation languages and models

Previous-generation languages and models

Starting August 1, 2023, all previous-generation models are now discontinued from the service. New clients must now only use the next-generation models. All existing clients must now migrate to the equivalent next-generation model. For more information, see Migrating to next-generation models.

The IBM Watson® Speech to Text service supports speech recognition with previous-generation models in many languages. The model indicates the language in which the audio is spoken and the rate at which it is sampled.

The models described on this page are referred to as previous-generation models. The service also offers next-generation models with enhanced qualities for improved speech recognition. For more information, see Next-generation languages and models.

Previous-generation model types

For most languages, the service makes available two types of previous-generation models:

  • Narrowband models are intended for audio that has a minimum sampling rate of 8 kHz. Use narrowband models for offline decoding of telephone speech, which is the typical use for this sampling rate.
  • Broadband models are for intended audio that has a minimum sampling rate of 16 kHz. Use broadband models for responsive, real-time applications, for example, for live-speech applications.

Choosing the correct model for your application is important. Use the model that matches the sampling rate (and language) of your audio. The service automatically adjusts the sampling rate of your audio to match the model that you specify. To achieve the best recognition accuracy, you also need to consider the frequency content of your audio. For more information, see Sampling rate and Audio frequency.

Supported previous-generation language models

The following sections list the previous-generation models of each type that are available for each language. The tables in the sections provide the following information:

  • The Model name column indicates the name of the model.

  • The Status column indicates whether the model is generally available (GA) or Beta.

  • The Recommended next-generation model identifies the next-generation model that you can use instead of a deprecated model.

    Currently, not all broadband models have equivalent multimedia models. In such cases, consider using the telephony model for that language. The service downsamples the audio to the rate of the model that you use. So sending broadband audio to a telephony model might prove a sufficient alternative in cases where no equivalent multimedia model is currently available.

All models are available for both product versions, IBM Cloud and IBM Cloud Pak for Data.

Narrowband models

Table 1 lists the previous-generation narrowband models that are available.

Table 1. Supported previous-generation narrowband models
Language Model name Status Recommended next-generation model
Chinese (Mandarin) zh-CN_NarrowbandModel GA
Discontinued
zh-CN_Telephony
Dutch (Netherlands) nl-NL_NarrowbandModel GA
Discontinued
nl-NL_Telephony
English (Australian) en-AU_NarrowbandModel GA
Discontinued
en-AU_Telephony
English (United Kingdom) en-GB_NarrowbandModel GA
Discontinued
en-GB_Telephony
English (United States) en-US_NarrowbandModel GA
Discontinued
en-US_Telephony
en-US_ShortForm_NarrowbandModel GA
Discontinued
en-US_Telephony
French (Canadian) fr-CA_NarrowbandModel GA
Discontinued
fr-CA_Telephony
French (France) fr-FR_NarrowbandModel GA
Discontinued
fr-FR_Telephony
German de-DE_NarrowbandModel GA
Discontinued
de-DE_Telephony
Italian it-IT_NarrowbandModel GA
Discontinued
it-IT_Telephony
Japanese ja-JP_NarrowbandModel GA
Discontinued
ja-JP_Telephony
IBM Cloud
Korean ko-KR_NarrowbandModel GA
Discontinued
ko-KR_Telephony
Portuguese (Brazilian) pt-BR_NarrowbandModel GA
Discontinued
pt-BR_Telephony
Spanish (Argentinian, Beta) es-AR_NarrowbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Castilian) es-ES_NarrowbandModel GA
Discontinued
es-ES_Telephony
Spanish (Chilean, Beta) es-CL_NarrowbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Colombian, Beta) es-CO_NarrowbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Mexican, Beta) es-MX_NarrowbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Peruvian, Beta) es-PE_NarrowbandModel Beta
Discontinued
es-LA_Telephony

Broadband models

Table 2 lists the previous-generation broadband models that are available.

Table 2. Supported previous-generation broadband models
Language Model name Status Recommended next-generation model
Arabic (Modern Standard) ar-MS_BroadbandModel GA
Discontinued
ar-MS_Telephony
Chinese (Mandarin) zh-CN_BroadbandModel GA
Discontinued
zh-CN_Telephony
Dutch (Netherlands) nl-NL_BroadbandModel GA
Discontinued
nl-NL_Multimedia
English (Australian) en-AU_BroadbandModel GA
Discontinued
en-AU_Multimedia
English (United Kingdom) en-GB_BroadbandModel GA
Discontinued
en-GB_Multimedia
English (United States) en-US_BroadbandModel GA
Discontinued
en-US_Multimedia
French (Canadian) fr-CA_BroadbandModel GA
Discontinued
fr-CA_Multimedia
French (France) fr-FR_BroadbandModel GA
Discontinued
fr-FR_Multimedia
German de-DE_BroadbandModel GA
Discontinued
de-DE_Multimedia
Italian it-IT_BroadbandModel GA
Discontinued
it-IT_Multimedia
Japanese ja-JP_BroadbandModel GA
Discontinued
ja-JP_Multimedia
Korean ko-KR_BroadbandModel GA
Discontinued
ko-KR_Multimedia
Portuguese (Brazilian) pt-BR_BroadbandModel GA
Discontinued
pt-BR_Multimedia
Spanish (Argentinian, Beta) es-AR_BroadbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Castilian) es-ES_BroadbandModel GA
Discontinued
es-ES_Multimedia
Spanish (Chilean, Beta) es-CL_BroadbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Colombian, Beta) es-CO_BroadbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Mexican, Beta) es-MX_BroadbandModel Beta
Discontinued
es-LA_Telephony
Spanish (Peruvian, Beta) es-PE_BroadbandModel Beta
Discontinued
es-LA_Telephony

The US English short-form model (Deprecated)

The US English short-form model, en-US_ShortForm_NarrowbandModel, can improve speech recognition for Interactive Voice Response (IVR) and Automated Customer Support solutions. The short-form model is trained to recognize the short utterances that are frequently expressed in customer support settings like automated support call centers. In addition to being tuned for short utterances in general, the model is also tuned for precise utterances such as digits, single-character word and name spellings, and yes-no responses.

The en-US_ShortForm_NarrowbandModel is optimal for the kinds of responses that are common to human-to-machine exchanges, such as the use case of IBM® Voice Agent with Watson. The en-US_NarrowbandModel is generally optimal for human-to-human conversations. However, depending on the use case and the nature of the exchange, some users might find the short-form model suitable for human-to-human conversations as well. Given this flexibility and overlap, you might experiment with both models to determine which works best for your application. In either case, applying a custom language model with a grammar to the short-form model can further improve recognition results.

As with all models, noisy environments can adversely impact the results. For example, background acoustic noise from airports, moving vehicles, conference rooms, and multiple speakers can reduce transcription accuracy. Audio from speaker phones can also reduce accuracy due to the echo common to such devices. Using the parameters available for speech activity detection can counteract such effects and help improve speech transcription accuracy. Applying a custom acoustic model can further fine-tune the acoustics for speech recognition, but only as a final measure.

Supported features for previous-generation models

Previous-generation models are supported for use with almost all of the service's features. Most features and models are generally available for production use. Where indicated, some features and models are beta functionality. Restrictions apply for some features, for example:

  • Features such as speaker labels, numeric redaction, and profanity filtering are limited to certain languages and models. Such restrictions are noted with the descriptions of the individual features. For more information about all available speech recognition parameters, see the Parameter summary.
  • The low_latency parameter is supported only for next-generation models. For more information, see Low latency.
  • For more information about previous-generation models' support for customization, see Customization support for previous-generation models.

Otherwise, when a feature is described as being available in general or available for a specific language or languages, it supports the previous-generation models.