The Speech to Text service converts the human voice into the written word. It can be used anywhere there is a need to bridge the gap between the spoken word and their written form, including voice control of embedded systems, transcription of meetings and conference calls, and dictation of email and notes. This easy-to-use service uses machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal to generate an accurate transcription. The following languages and features are currently available:
English (US), English (UK), Japanese, Arabic (MSA, Broadband model only), Mandarin, Portuguese (Brazil), Spanish, French (Broadband model only), Korean
Receive a metadata object in the JSON response that includes confidence score (per word), start/end time (per word), and alternate hypotheses / N-Best (per phrase). A new option for returning word alternatives per (sequential) time intervals is now available.
Mobile SDKs are now available to enable native interaction on iOS and Android devices.
Optional ability to search for one or more keywords in the audio stream. The returned metadata includes the beginning time, end time and confidence score for each instance of the keyword found. Keyword Spotting is currently available at no additional charge.
A localized version of this Watson service is available in Japan. Visit the following link for details: http://www.softbank.jp/biz/watson