Previous-generation languages and models

Starting August 1, 2023, all previous-generation models are now discontinued from the service. New clients must now only use the next-generation models. All existing clients must now migrate to the equivalent next-generation model. For more information, see Migrating to next-generation models.

The IBM Watson® Speech to Text service supports speech recognition with previous-generation models in many languages. The model indicates the language in which the audio is spoken and the rate at which it is sampled.

The models described on this page are referred to as previous-generation models. The service also offers next-generation models with enhanced qualities for improved speech recognition. For more information, see Next-generation languages and models.

Previous-generation model types

For most languages, the service makes available two types of previous-generation models:

Narrowband models are intended for audio that has a minimum sampling rate of 8 kHz. Use narrowband models for offline decoding of telephone speech, which is the typical use for this sampling rate.
Broadband models are for intended audio that has a minimum sampling rate of 16 kHz. Use broadband models for responsive, real-time applications, for example, for live-speech applications.

Choosing the correct model for your application is important. Use the model that matches the sampling rate (and language) of your audio. The service automatically adjusts the sampling rate of your audio to match the model that you specify. To achieve the best recognition accuracy, you also need to consider the frequency content of your audio. For more information, see Sampling rate and Audio frequency.

Supported previous-generation language models

The following sections list the previous-generation models of each type that are available for each language. The tables in the sections provide the following information:

The Model name column indicates the name of the model.
The Status column indicates whether the model is generally available (GA) or Beta.
The Recommended next-generation model identifies the next-generation model that you can use instead of a deprecated model.

Currently, not all broadband models have equivalent multimedia models. In such cases, consider using the telephony model for that language. The service downsamples the audio to the rate of the model that you use. So sending broadband audio to a telephony model might prove a sufficient alternative in cases where no equivalent multimedia model is currently available.

All models are available for both product versions, IBM Cloud and IBM Cloud Pak for Data.

Narrowband models

Table 1 lists the previous-generation narrowband models that are available.

Supported previous-generation narrowband models
Language	Model name	Status	Recommended next-generation model
Chinese (Mandarin)	`zh-CN_NarrowbandModel`	GA Discontinued	`zh-CN_Telephony`
Dutch (Netherlands)	`nl-NL_NarrowbandModel`	GA Discontinued	`nl-NL_Telephony`
English (Australian)	`en-AU_NarrowbandModel`	GA Discontinued	`en-AU_Telephony`
English (United Kingdom)	`en-GB_NarrowbandModel`	GA Discontinued	`en-GB_Telephony`
English (United States)	`en-US_NarrowbandModel`	GA Discontinued	`en-US_Telephony`
	`en-US_ShortForm_NarrowbandModel`	GA Discontinued	`en-US_Telephony`
French (Canadian)	`fr-CA_NarrowbandModel`	GA Discontinued	`fr-CA_Telephony`
French (France)	`fr-FR_NarrowbandModel`	GA Discontinued	`fr-FR_Telephony`
German	`de-DE_NarrowbandModel`	GA Discontinued	`de-DE_Telephony`
Italian	`it-IT_NarrowbandModel`	GA Discontinued	`it-IT_Telephony`
Japanese	`ja-JP_NarrowbandModel`	GA Discontinued	`ja-JP_Telephony` IBM Cloud
Korean	`ko-KR_NarrowbandModel`	GA Discontinued	`ko-KR_Telephony`
Portuguese (Brazilian)	`pt-BR_NarrowbandModel`	GA Discontinued	`pt-BR_Telephony`
Spanish (Argentinian, Beta)	`es-AR_NarrowbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Castilian)	`es-ES_NarrowbandModel`	GA Discontinued	`es-ES_Telephony`
Spanish (Chilean, Beta)	`es-CL_NarrowbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Colombian, Beta)	`es-CO_NarrowbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Mexican, Beta)	`es-MX_NarrowbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Peruvian, Beta)	`es-PE_NarrowbandModel`	Beta Discontinued	`es-LA_Telephony`

Broadband models

Table 2 lists the previous-generation broadband models that are available.

Supported previous-generation broadband models
Language	Model name	Status	Recommended next-generation model
Arabic (Modern Standard)	`ar-MS_BroadbandModel`	GA Discontinued	`ar-MS_Telephony`
Chinese (Mandarin)	`zh-CN_BroadbandModel`	GA Discontinued	`zh-CN_Telephony`
Dutch (Netherlands)	`nl-NL_BroadbandModel`	GA Discontinued	`nl-NL_Multimedia`
English (Australian)	`en-AU_BroadbandModel`	GA Discontinued	`en-AU_Multimedia`
English (United Kingdom)	`en-GB_BroadbandModel`	GA Discontinued	`en-GB_Multimedia`
English (United States)	`en-US_BroadbandModel`	GA Discontinued	`en-US_Multimedia`
French (Canadian)	`fr-CA_BroadbandModel`	GA Discontinued	`fr-CA_Multimedia`
French (France)	`fr-FR_BroadbandModel`	GA Discontinued	`fr-FR_Multimedia`
German	`de-DE_BroadbandModel`	GA Discontinued	`de-DE_Multimedia`
Italian	`it-IT_BroadbandModel`	GA Discontinued	`it-IT_Multimedia`
Japanese	`ja-JP_BroadbandModel`	GA Discontinued	`ja-JP_Multimedia`
Korean	`ko-KR_BroadbandModel`	GA Discontinued	`ko-KR_Multimedia`
Portuguese (Brazilian)	`pt-BR_BroadbandModel`	GA Discontinued	`pt-BR_Multimedia`
Spanish (Argentinian, Beta)	`es-AR_BroadbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Castilian)	`es-ES_BroadbandModel`	GA Discontinued	`es-ES_Multimedia`
Spanish (Chilean, Beta)	`es-CL_BroadbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Colombian, Beta)	`es-CO_BroadbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Mexican, Beta)	`es-MX_BroadbandModel`	Beta Discontinued	`es-LA_Telephony`
Spanish (Peruvian, Beta)	`es-PE_BroadbandModel`	Beta Discontinued	`es-LA_Telephony`

The US English short-form model (Deprecated)

The US English short-form model, en-US_ShortForm_NarrowbandModel, can improve speech recognition for Interactive Voice Response (IVR) and Automated Customer Support solutions. The short-form model is trained to recognize the short utterances that are frequently expressed in customer support settings like automated support call centers. In addition to being tuned for short utterances in general, the model is also tuned for precise utterances such as digits, single-character word and name spellings, and yes-no responses.

The en-US_ShortForm_NarrowbandModel is optimal for the kinds of responses that are common to human-to-machine exchanges, such as the use case of IBM® Voice Agent with Watson. The en-US_NarrowbandModel is generally optimal for human-to-human conversations. However, depending on the use case and the nature of the exchange, some users might find the short-form model suitable for human-to-human conversations as well. Given this flexibility and overlap, you might experiment with both models to determine which works best for your application. In either case, applying a custom language model with a grammar to the short-form model can further improve recognition results.

As with all models, noisy environments can adversely impact the results. For example, background acoustic noise from airports, moving vehicles, conference rooms, and multiple speakers can reduce transcription accuracy. Audio from speaker phones can also reduce accuracy due to the echo common to such devices. Using the parameters available for speech activity detection can counteract such effects and help improve speech transcription accuracy. Applying a custom acoustic model can further fine-tune the acoustics for speech recognition, but only as a final measure.

For more information about language model and acoustic model customization, see Understanding customization.
For more information about grammars, see Using grammars with custom language models.
For more information about speech activity detection parameters, see Speech activity detection.

Supported features for previous-generation models

Previous-generation models are supported for use with almost all of the service's features. Most features and models are generally available for production use. Where indicated, some features and models are beta functionality. Restrictions apply for some features, for example:

Features such as speaker labels, numeric redaction, and profanity filtering are limited to certain languages and models. Such restrictions are noted with the descriptions of the individual features. For more information about all available speech recognition parameters, see the Parameter summary.
The low_latency parameter is supported only for next-generation models. For more information, see Low latency.
For more information about previous-generation models' support for customization, see Customization support for previous-generation models.

Otherwise, when a feature is described as being available in general or available for a specific language or languages, it supports the previous-generation models.