Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
    • HelpHelp
      • Docs
  • Log in
  • Sign up
  • Catalog
  • Cost Estimator
  • Help
    • Docs

  • Navigation settings

Error

Change theme

This feature is in early stage, some parts of the platform might not fully support different themes yet.

Themes
  1. Catalog

Text to Speech

Synthesizes natural-sounding speech from text.

  • Date of last update: 12/12/2024
  • Docs
  • API docs
  • Service
  • IBM
  • 12/12/2024
  • AI / Machine Learning
  • EU Supported
  • HIPAA Enabled
  • IAM-enabled
  • API docs
  • Docs
  • Terms

Pricing plans

PlanFeatures and capabilitiesPricing

  • Service
  • IBM
  • 12/12/2024
  • AI / Machine Learning
  • EU Supported
  • HIPAA Enabled
  • IAM-enabled
  • API docs
  • Docs
  • Terms

Summary

The Text to Speech service converts written text to natural-sounding speech. The service streams the synthesized audio back with minimal delay. The audio uses appropriate cadence and intonation for its language and dialect to provide voices that are smooth and natural. The service can be used in applications such as voice-automated chatbots, as well as a variety of voice-driven and screenless applications, such as tools for the disabled or visually impaired, video narration and voice over, and educational and home-automation solutions.

Features and capabilities

Available languages

Arabic, Brazilian Portuguese, Chinese (Mandarin dialect), Dutch, English (US and UK dialects), French, German, Italian, Japanese, Korean, and Spanish (Castilian, Latin American, and North American dialects).

Available voices

Choose from a variety of male and female voices for different languages. Most languages provide both Neural and Standard voices, although some provide only one type. Neural voices generate audio by relying on Deep Neural Networks to predict the acoustic features of the requested speech. Standard voices assemble audio by concatenating segments of recorded speech.

Interfaces and SDKs

Request synthesis with HTTP REST or WebSocket APIs. For languages other than Japanese, WebSockets also allow you to obtain timing information for words of the resulting audio. Use SDKs for simplified rapid development in Node, Java, Python, Swift, and many other languages.

SSML

Annotate input text with the Speech Synthesis Markup Language (SSML), a standard XML-based notation for speech-synthesis applications. Use SSML to control aspects of speech synthesis such as pronunciation, volume, pitch, speed, and other attributes.

Voice Customization

Use voice customization to refine the service's language-dependent rules for pronunciation. Define custom dictionaries for domain-specific terms, words with foreign origins, personal or geographic names, and abbreviations or acronyms in your application's lexicon. Define pronunciations based on other words, or create pronunciations based on phoneme symbols in the International Phonetic Alphabet (IPA) or IBM Symbolic Phonetic Representation (SPR).

Custom Voices

Work with IBM to train a new voice for your specific use case and target market. IBM can train a new voice with as little as one hour of training data. This feature is currently available only to Premium customers.

Getting support


If you're experiencing issues with this product, go to the IBM Cloud Support Center and navigate to creating a case. Use the All products option to search for this product to continue creating the case or to find more information about getting support. Third party and community supported products might direct you to a support process outside of IBM Cloud.

Summary

Text to Speech

    Already have an account? Log in