Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
  • DocsDocs
  • Catalog
  • Cost Estimator
  • Docs

  • Navigation settings
Confirm
Do you want to log out?
CancelLog out

Error

Two-factor AuthenticationAuthentication Failed

Please answer the security question you selected for the following account:

Two-factor authentication is enabled for the following account:

Phone authentication is enabled for the following account:

  • Loading...
    Need help? Call us at 1-866-325-0045 and select option 2.

    Please wait for phone authentication...

    Invalid answer provided for security question. Please try again or cancel the action.

    Invalid code provided. Please try again or cancel the action.

    Phone authentication is timed out, Please cancel the action and try again later.

    Too many fail attempts. Please cancel the action and try again later.

    Authentication failed. Please try again or cancel the action.

    • Log in
    • Sign up
    1. Catalog
    2. Services

    Text to Speech

    • IBM
    • Date of last update: 04/12/2021
    • Docs
    • API docs

    Pricing plans

    PlanFeaturesPricing

    Summary

    Text to Speech

      Already have an account? Log in
      Type
      • Service
      Provider
      • IBM
      Category
      • AI / Machine Learning
      Compliance
      • EU Supported
      • HIPAA Enabled
      • IAM-enabled
      Related links
      • API docs
      • Docs
      • Terms

      Summary

      The Text to Speech service converts written text to natural-sounding speech. The service streams the synthesized audio back with minimal delay. The audio uses appropriate cadence and intonation for its language and dialect to provide voices that are smooth and natural. The service can be used in applications such as voice-automated chatbots, as well as a variety of voice-driven and screenless applications, such as tools for the disabled or visually impaired, video narration and voice over, and educational and home-automation solutions.

      Features

      Available languages

      Arabic, Brazilian Portuguese, Chinese (Mandarin dialect), Dutch, English (US and UK dialects), French, German, Italian, Japanese, Korean, and Spanish (Castilian, Latin American, and North American dialects).

      Available voices

      Choose from a variety of male and female voices for different languages. Most languages provide both Neural and Standard voices, although some provide only one type. Neural voices generate audio by relying on Deep Neural Networks to predict the acoustic features of the requested speech. Standard voices assemble audio by concatenating segments of recorded speech.

      Interfaces and SDKs

      Request synthesis with HTTP REST or WebSocket APIs. For languages other than Japanese, WebSockets also allow you to obtain timing information for words of the resulting audio. Use SDKs for simplified rapid development in Node, Java, Python, Swift, and many other languages.

      SSML

      Annotate input text with the Speech Synthesis Markup Language (SSML), a standard XML-based notation for speech-synthesis applications. Use SSML to control aspects of speech synthesis such as pronunciation, volume, pitch, speed, and other attributes.

      Voice Customization

      Use voice customization to refine the service's language-dependent rules for pronunciation. Define custom dictionaries for domain-specific terms, words with foreign origins, personal or geographic names, and abbreviations or acronyms in your application's lexicon. Define pronunciations based on other words, or create pronunciations based on phoneme symbols in the International Phonetic Alphabet (IPA) or IBM Symbolic Phonetic Representation (SPR).

      Custom Voices

      Work with IBM to train a new voice for your specific use case and target market. IBM can train a new voice with as little as one hour of training data. This feature is currently available only to Premium customers.