Speech to Text 入门

IBM Watson® Speech to Text 服务用于将音频转录为文本，以支持针对各种应用程序的语音转录功能。本教程以 curl 为基础，可帮助您快速入门。其中的示例说明了如何调用服务的 POST /v1/recognize 方法来请求文字记录。

本教程使用 curl 命令行实用程序来演示 REST API 调用。有关 curl 的更多信息，请参阅将 curl 与 Watson 示例配合使用。

IBM Cloud 观看以下视频以获取 Speech to Text 服务入门的可视摘要。

准备工作

IBM Cloud

创建服务的实例：
1. 转至 IBM Cloud 目录中的 Speech to Text 页面。
2. 注册免费的 IBM Cloud 帐户或登录。
3. 阅读并同意许可协议的条款。
4. 单击创建。
复制凭证以向服务实例进行认证：
1. 查看服务实例的管理页面:
  - 如果您位于服务实例的“入门”页面上，请单击主题列表中的管理条目。
  - 如果您位于“资源列表”页面上，请在名称列中展开 AI/ Machine Learning 分组，然后单击服务实例的名称。
2. 在“管理”页面上，单击凭证框中的 显示凭证。
3. 复制服务实例的 API Key 和 URL 值。

本教程使用 API 密钥进行认证。在生产中，使用 IAM 令牌。更多信息，请参见验证至 IBM Cloud。

IBM Cloud Pak for Data

在开始本教程之前，必须安装并配置 Speech to Text 服务。如需了解更多信息，请发送电子邮件至 Watson，Cloud Pak for Data 提供演讲服务。

使用 Web 客户机，API 或命令行界面创建服务实例。如需了解有关在 IBM Cloud Pak for Data 上创建服务实例的更多信息，请参阅为 Watson 语音服务创建服务实例。
遵循 创建 Watson Speech 服务实例 中的指示信息以获取该实例的不记名令牌。本教程使用不记名令牌向服务进行认证。

在不使用选项的情况下转录音频

调用不包含其他请求参数的 POST /v1/recognize 方法来请求 FLAC 音频文件的基本文字记录。

下载样本音频文件 audio-file.flac。
发出以下命令以在不使用任何参数的情况下调用服务的 /v1/recognize 方法，从而获取基本转录。示例使用 Content-Type 头来指示音频的类型 audio/flac。此示例使用缺省语言模型 en-US_BroadbandModel 进行转录。

IBM Cloud
- 将 {apikey} 和 {url} 替换为您的 API 密钥和 URL。
- 修改 {path_to_file} 以指定 audio-file.flac 文件的位置。
```
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize"
```
IBM Cloud Pak for Data IBM Software Hub
- 将 {token} 和 {url} 替换为访问令牌，将 URL 替换为您的服务实例。
- 修改 {path_to_file} 以指定 audio-file.flac 文件的位置。
```
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize"
```

服务返回的转录结果如下：

{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.96
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ]
}

使用选项转录音频

调用 POST /v1/recognize 方法来转录相同的 FLAC 音频文件，但这次指定两个转录参数。

如有需要，请下载音频文件样本 audio-file.flac。
发出以下命令以使用两个额外参数调用服务的 /v1/recognize 方法。将 timestamps 参数设置为 true，以指示音频流中每个词的开始和结束时间。将 max_alternatives 参数设置为 3，以接收三种最有可能的转录替代项。示例使用 Content-Type 头来指示音频的类型 audio/flac，并且请求使用缺省模型 en-US_BroadbandModel。

IBM Cloud
- 将 {apikey} 和 {url} 替换为您的 API 密钥和 URL。
- 修改 {path_to_file} 以指定 audio-file.flac 文件的位置。
```
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize?timestamps=true&max_alternatives=3"
```
IBM Cloud Pak for Data IBM Software Hub
- 将 {token} 和 {url} 替换为访问令牌，将 URL 替换为您的服务实例。
- 修改 {path_to_file} 以指定 audio-file.flac 文件的位置。
```
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/flac" \
--data-binary @{path_to_file}audio-file.flac \
"{url}/v1/recognize?timestamps=true&max_alternatives=3"
```

服务返回以下结果，其中包含时间戳记和三个替代转录：

{
  "result_index": 0,
  "results": [
    {
      "alternatives": [
        {
          "timestamps": [
            ["several":, 1.0, 1.51],
            ["tornadoes":, 1.51, 2.15],
            ["touch":, 2.15, 2.5],
            . . .
          ]
        },
        {
          "confidence": 0.96
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado and Sunday "
        }
      ],
      "final": true
    }
  ]
}

后续步骤

要尝试从流式音频输入或上载的文件中转录文本的示例应用程序，请参阅 Speech to Text 演示。
有关服务的接口和功能的更多信息，请参阅服务功能。
如需了解服务界面的所有方法，请参阅 API和SDK参考。