Skip to content
Navigation Menu

IBM Cloud

  • CatalogCatalog
  • Cost EstimatorCost Estimator
  • DocsDocs
  • Catalog
  • Cost Estimator
  • Docs

  • Navigation settings
Confirm
Do you want to log out?
CancelLog out

Error

Two-factor AuthenticationAuthentication Failed

Please answer the security question you selected for the following account:

Two-factor authentication is enabled for the following account:

Phone authentication is enabled for the following account:

  • Loading...
    Need help? Call us at 1-866-325-0045 and select option 2.

    Please wait for phone authentication...

    Invalid answer provided for security question. Please try again or cancel the action.

    Invalid code provided. Please try again or cancel the action.

    Phone authentication is timed out, Please cancel the action and try again later.

    Too many fail attempts. Please cancel the action and try again later.

    Authentication failed. Please try again or cancel the action.

    • Log in
    • Sign up
    1. Catalog
    2. Services

    Speech to Text

    • IBM
    • Date of last update: 04/13/2021
    • Docs
    • API docs

    Pricing plans

    PlanFeaturesPricing

    Summary

    Speech to Text

      Already have an account? Log in
      Type
      • Service
      Provider
      • IBM
      Category
      • AI / Machine Learning
      Compliance
      • EU Supported
      • HIPAA Enabled
      • IAM-enabled
      Related links
      • API docs
      • Docs
      • Terms

      Summary

      The Speech to Text service converts the human voice into the written word. The service uses deep-learning AI to apply knowledge of grammar, language structure, and the composition of audio and voice signals to accurately transcribe human speech. It can be used in applications such as voice-automated chatbots, analytic tools for customer-service call centers, and multi-media transcription, among many others.

      Features

      Available languages

      Brazilian Portuguese, Chinese (Mandarin dialect), Dutch, English (US and UK dialects), French, German, Italian, Japanese, Korean, Spanish (Argentinian, Castilian, Chilean, Colombian, Mexican, and Peruvian dialects), and Modern Standard Arabic (broadband model only). Base models are available for audio sampled at 16 kHz broadband and 8k Hz narrowband in a wide range of audio formats.

      Interfaces and SDKs

      Request transcription with synchronous or asynchronous HTTP REST APIs, or use WebSockets for efficient, low-latency, high-throughput requests over a full-duplex connection. Send all audio at once or stream continuous audio for live speech recognition. Use SDKs for simplified rapid development in Node, Java, Python, Swift, and many other languages.

      Language and Acoustic Customization

      Use language model customization to define domain-specific words that expand the service's base vocabulary; acoustic model customization to enhance recognition for the acoustic characteristics of your audio; and grammars to limit recognition to specific strings and phrases only. Create multiple models and grammars for different purposes, and combine all three capabilities to adapt recognition for your application's requirements.

      Keyword spotting and speaker labels

      Identify specific keyword strings from the audio with a user-defined level of confidence. Identify different speakers from a multi-participant conversation.

      Transcript metadata

      Receive a JSON response that includes confidence scores, start and end times, and multiple possible alternatives. Split a transcript into multiple results based on semantic features such as sentences.

      Transcript refinement

      Apply smart formatting to convert dates, times, numbers, currency values, phone numbers, and more to conventional written forms in final transcripts. Redact sensitive personal information such as credit card numbers from transcripts. Censor profanity from US English transcripts and metadata.

      Processing and audio metrics

      Request processing metrics for detailed information about the service's analysis or your audio, or audio metrics for details about the precise signal characteristics of your audio.