Response formatting and filtering
The IBM Watson® Speech to Text service provides three features that you can use to parse transcription results. You can format a final transcript to include more conventional representations of certain strings and to include punctuation. You can redact sensitive numeric information from a final transcript, and you can filter profanity from most transcription results. All of these features are beta functionality and are restricted to certain languages.
Smart formatting Version 2
The new version of smart formatting feature is available for US English, Brazilian Portuguese, French, German, Castilian Spanish, Spanish Latin American, and French Canadian. It is also available for the en-WW_Medical_Telephony model when US English audio is recognized.
The new version:
- provides more flexibility in adding new languages and patterns compared to older smart formatting.
- uses a more sophisticated machine learning technique (Weighted Finite State Transducers) to identify entities in texts compared to the older version, which was rule-based approach.
- provides more accurate entity classification and formatting and also adds capability to define hierarchies using weights when same text can be identified as two different entity type.
The smart_formatting
feature directs the service to convert the following strings into more conventional representations:
- Dates and times
- Integers, decimals, ordinals
- Alphanumeric sequences (of length > 2)
- Phone numbers
- Currency values
- Measures (
/km²
,kg
,mph
,m³
, etc) - Emails, URLs and IP addresses
- Credit Card Numbers (formatted as groups of 4 digits)
- Punctuations (as spoken in dictations)
To use the new smart formatting feature for US English, Brazilian Portuguese, French and German; set the parameter smart_formatting=true and smart_formatting_version=2.
Entity Patterns and Examples
US English
- Different spoken forms of dates are accepted, including dates just as numbers or names of months and use of
the
andof
(the twenty fifth of july twenty twelve
). The dates are formatted asm/d/yyyy
. - Times are identified by keywords or suffix, for example, timezones (for example,
est
,eastern
),am
,pm
,hours
,o'clock
,minutes past hour
. - Phone numbers must be either
911
or a number that contains 10 digits and/or starts with the number[+]1
. - Currency symbols are substituted for strings in appropriate contextsfor, for example,
dollar
,cent
,euro
,yen
.cent
is optional afterdollar
, for example,twelve dollars twenty five
andtwelve dollars twenty five cents
formatted as$12.25
. - Internet email addresses with common format (for example,
[alphanumeric+symbols]+ at [alphanumeric dot]+ domainname
) are smart formatted. - Web URLs, both short and long form are formatted. It includes protocol (
http/s
), subdomain (www
), ports (443,80
) and paths (/help/abc
). - Most large integers are formatted as numeric sequences. When large numbers (million, billions) are spoken with a single group integers, quantity word
million/billion
is not converted for readability, for example,fifty nine million
->59 million
but when the number is more complex, it is formatted as numeric digits, for example,fifty nine million and one
->59000001
. - Numbers less than 10 are not converted to digit to avoid odd formatting, for example,
You are one of them
->You are 1 of them
. But in other context like expressing currency, they are converted appropriately, for example,Give me one dollar
->Give me $1
. - Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken or dictated punctuation symbols for the keyword strings.
comma
(,
),period
(.
),question mark
(?
),exclamation point
(!
),semicolon
(;
),hyphen
(-
).
Smart formatting Examples
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | july twenty fifth two thousand twelve | 7/25/2012 |
the twenty fifth of july twenty twelve | 7/25/2012 | |
january the thirty first two thousand | 1/31/2000 | |
zero five zero five nineteen eighty three | 6/20/2014 | |
second quarter of twenty twenty two | Q2 2022 | |
Times | it is two eleven eastern | it is 02:11 est |
we begin at oh seven hundred hours | we begin at 07:00 | |
quarter past one | 01:15 | |
three o'clock | 03:00 | |
Numbers | The quantity is one million one hundred and one | The quantity is 1000101 |
One point five is between one and two | 1.5 is between 1 and 2 | |
It would cost five point two million | It would cost 5.2 million | |
Its one hundred twenty first trial | Its 121st trial | |
Phone numbers | nine one four five five six eight three three one | 914-556-8331 |
plus one nine two three one two three five six seven eight | +1 923-123-5678 | |
Currency values | You owe me four united states dollars and sixty nine cents | You owe me $4.69 |
seventy five dollars sixty three | $75.63 | |
Dollar rose to one hundred and nine point seven nine yen | Dollar rose to ¥109.79 | |
Email, URL, IP | I saw the story on w w w dot yahoo dot com | I saw the story on www.yahoo.com |
a b three hyphen s d d dash three at g mail dot com | ab3-sdd-3@gmail.com | |
h t t p colon slash slash w w w dot c o m d a i l y n e w s dot a b slash s m | http://www.comdailynews.ab/sm | |
two two five dot double five dot o dot forty five | 225.55.0.45 | |
Measures | two hundred kilometers per hour | 200 km/h |
two kilo watt hours | 2 kWh | |
Sequences | H F H nine nine three dot seven B | HFH993.7B |
a ten eighty p display | a 1080p display |
Brazilian-Portuguese
- For dates,
do
andde
in the transcript is used as separators for day, month and year.primeiro
is considered as 1st of the month. The dates are formatted asDD/MM/YYYY
. - Times are identified by keywords and prefix, for example,
às
ao
,à
,da tarde
(p.m.
),da madrugada
(a.m.
),meia noite
,meio dia
. Prefixesàs
ao
,à
are optional. - Landline numbers must have 10 digits (2 digits country code and 8 digits number), mobile numbers are 9 digits with first digit as
9
with optional country code. Area codes are optional. Numbers are formatted as+NN (NN) NNNN-NNNN
and+NN (NN) 9NNNN-NNNN
. - Brazilian real currency symbol is
R$
. Other Currency symbols are substituted for strings in appropriate contexts, for example,dollar
,cent
,euro
,yen
.centavos
is optional afterreais
for example,setenta e cinco dólares e sessenta e três
andsetenta e cinco dólares e sessenta e três centavos
formatted asR$75,63
- Internet email addresses with common format (for example,
[alphanumeric+symbols]+ arroba [alphanumeric ponto]+ domainname
) are smart formatted. - Web URLs, both short and long form, are formatted. It includes protocol (
http/s
), subdomain (www
), ports (443,80
) and paths (/help/abc
). - Most large integer are formatted as numeric sequences. When large numbers (milhões, bilhões, etc) are spoken with a single group integers, quantity word
milhões/bilhões
is not converted for readability for example,doze milhões
->12 milhões
but when the number is more complex, it's formatted as numeric digits for example,doze milhões e um
->12000001
. - Numbers less than 10 are not formatted to digits to avoid odd conversions, for example,
vivo em uma casa
-->vivo em 1 casa
. - Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken/dictated punctuation symbols for the keyword strings.
vírgula
(,
),ponto
(.
),ponto de interrogação
(?
),ponto de exclamação
(!
),ponto e vírgula
(;
),hífen
(-
).
Smart formatting Examples for Brazilian-Portuguese
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | trinta e um de dezembro de mil novecentos e oitenta e oito | 31/12/1988 |
um do um de mil novecentos e oitenta e sete | 01/01/1987 | |
Times | quinze pro meio dia | 11:45 |
meio dia e meia hora | 12:30 | |
ao meio dia e meio | ao 12:30 | |
às dez pras duas da madrugada | às 1:50 a.m. | |
às quinze para a meia noite | às 23:45 | |
Numbers | cento e quarenta e sete mil quatrocentos e cinquenta e um | 147451 |
um vírgula vinte e seis | 1,26 | |
décimo primeiro | 11º | |
Phone numbers | quatro cinco um dois três quatro cinco seis sete oito | (45) 1234-5678 |
onze nove nove oito meia cinco quinze zero dois | (11) 99865-1502 | |
nove vinte e sete vinte e oito trinta e sete trinta e oito | 92728-3738 | |
mais cinco cinco onze nove meia nove zero meia zero um quatro meia | +55 (11) 96906-0146 | |
Currency values | vinte e cinco centavos | R$ 0,25 |
vinte e nove dólares e cinquenta centavos | $ 29,50 | |
vinte e cinco centavos | R$ 0,25 | |
Email, URL, IP | a ponto b c arroba g mail ponto com | a.bc@gmail.com |
dáblio dáblio dáblio ponto a b c ponto es barra e f g | www.abc.es/efg | |
w w w ponto nvidia ponto com | www.nvidia.com | |
noventa e oito ponto setenta e seis ponto noventa e oito ponto dezesseis | 98.76.98.16 | |
Measures | duzentos e quarenta e cinco quilômetros por hora | 245 kph |
duzentos e quarenta e cinco metros por segundo | 245 m/s | |
Sequences | d dezesseis três nove c hífen f noventa e oito | d1639c-f98 |
Modelo f t doze x | Modelo ft12x |
French
- In dates, the ordinal
premier
is considered 1st of the month. The dates are formatted asDD/MM/YYYY
. - Times are identified by keywords and prefix, for example,
heures
,de l'après-midi
ordu soir
,du matin
,midi
. Times are formatted as 24H clock :HH h MM
- Telephone numbers must have 9 or 10 digits (5 pairs of two digits). In cases where only one digit of the first pairing is admitted, assumes that the 0 was skipped. Numbers are formatted as
NN NN NN NN NN
. - When the
de
ord'
preposition is used to express currency, the currency symbol is not used to format. This usually occurs with large round numbers, for example,un milliard d'euro
formatted as1 milliard d'euro
. - Internet email addresses with common format ( for example,
[alphanumeric+symbols]+ arobase [alphanumeric point]+ domainname
) are smart formatted.@
can be represented by any of these:arobase
,chez
,at
,à
. - Cardinals less than nine are not converted (in order to avoid
j'ai un pomme
->j'ai 1 pomme
and any other odd conversions. - For ordinals, 'siècles' are rendered in roman numerals when given an ordinal adjective. e
dix-neuvième siècle
->XIXᵉ siècle
. - Formatting of Fractions is supported. For example,
un onzième
->1/11
. - Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken/dictated punctuation symbols for the keyword strings.
virgule
(,
),point
(.
),point d'interrogation
(?
),point d'exclamation
(!
),point-virgule
(;
),trait d'union
(-
).
Smart formatting Examples for French
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | vingt-quatre juillet deux-mille-treize | 24/7/2013 |
dix-huit mai dix-neuf cent trente | 18/5/1930 | |
Times | huit heures du matin | 8 h |
onze heures cinquante-sept | 11 h 57 | |
deux heures de l'après-midi | 14 h | |
Numbers | cent quarante-sept mille quatre cent cinquante et une | 147451 |
moins vingt-cinq-mille-trente-sept | 25037 | |
vingt-troisièmes | 23ᵉˢ | |
quatre et deux quatrièmes | 4 2/4 | |
Phone numbers | double neuf douze trente-deux trente trente | 99 12 32 30 30 |
deux douze trente-deux trente trente | 02 12 32 30 30 | |
Currency values | deux dollars vingt | 2,20 $ |
cinq euro et soixante | 5,60 € | |
quatre virgule quatre-vingt milliards d'euros | 4,80 milliards d'euros | |
Email, URL, IP | a b trois point s d d point trois arobase g mail point com | ab3.sdd.3@gmail.com |
w w w point web point c o point f r | www.web.co.fr | |
double neuf dot trente-deux dot trente dot trente | 99.32.30.30 | |
Measures | quarante-deux-mille-deux-cent-cinquante-neuf par mètre carré | 42 259 /m² |
deux cents kilomètres heure | 200 km/h | |
Sequences | le document numéro zéro deux trente-six vingt-quatre | le document numéro 023624 |
r t x dix-huit t i | rtx18ti |
French-Canadian
- In dates, the ordinal
premier
is considered 1st of the month. The dates are formatted asDD/MM/YYYY
. - Times are identified by keywords and prefix e.g.
heures
,de l'après-midi
ordu soir
,du matin
,midi
. Times are formatted as 24H clock :HH h MM
- Phone numbers must be either
911
or a number that contains 10 digits and/or starts with the number[+]1
. - Internet email addresses with common format ( e.g.
[alphanumeric+symbols]+ arobase [alphanumeric point]+ domainname
) are smart formatted.@
can be represented by any of these:arobase
,chez
,at
,à
. - Cardinals below nine are not converted if they occur in midst of other text(in order to avoid
j'ai un pomme
->j'ai 1 pomme
and any other odd conversions). They are still formatted if they occur in isolation with no other text. - Formatting of Fractions is supported. e.g.
un onzième
->1/11
- Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken/dictated punctuation symbols for the keyword strings.
virgule
(,
),point
(.
),point d'interrogation
(?
),point d'exclamation
(!
),point-virgule
(;
),trait d'union
(-
), etc.
Smart formatting Examples for French-Canadian
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | vingt-quatre juillet deux-mille-treize | 24/7/2013 |
dix-huit mai dix-neuf cent trente | 18/5/1930 | |
Times | huit heures du matin | 8 h |
onze heures cinquante-sept | 11 h 57 | |
deux heures de l'après-midi | 14 h | |
Numbers | cent quarante-sept mille quatre cent cinquante et une | 147451 |
moins vingt-cinq-mille-trente-sept | 25037 | |
vingt-troisièmes | 23es | |
quatre et deux quatrièmes | 4 2/4 | |
Phone numbers | plus un cinq un quatre cinq cinq cinq un deux trois quatre | +1 (514) 555-1234 |
cinq un quatre quatre six neuf deux un zéro zéro | 02 12 32 30 30 | |
Currency values | deux dollars vingt | 2,20 $ |
vingt dollars cinq | 20,05 $ | |
quatre virgule quatre-vingt milliards d'euros | 4,80 milliards d'euros | |
Email, URL, IP | a b trois point s d d point trois arobase g mail point com | ab3.sdd.3@gmail.com |
w w w point web point c o point f r | www.web.co.fr | |
double neuf dot trente-deux dot trente dot trente | (514) 469-210 | |
Measures | quarante-deux-mille-deux-cent-cinquante-neuf par mètre carré | 42 259 /m² |
deux cents kilomètres heure | 200 km/h | |
Sequences | le document numéro zéro deux trente-six vingt-quatre | le document numéro 023624 |
r t x dix-huit t i | rtx18ti |
Spanish
- In dates, the ordinal
primero
is considered 1st of the month. The dates are formatted asDD/MM/YYYY
. - Times on the hour or time without an article followed by a suffix (indicating 'a.m.' or 'p.m.'), it will be converted .e.g
las dos pe eme
. Times are formatted as 24H clock :HH h MM
or as 12H clock with a.m./p.m. - Telephone numbers must have 8, 9 or 10 digits. Numbers are formatted as
NNNN NNNN
orNNN NNN NNN
orNNN NNN NNNN
- Internet email addresses with common format ( e.g.
[alphanumeric+symbols]+ arroba [alphanumeric punto]+ domainname
) are smart formatted. - Cardinals below nine are not converted if they occur in midst of other text(in order to avoid
un gato en el camino
->1 gato en el camino
and any other odd conversions). They are still formatted if they occur in isolation with no other text. - Formatting of Fractions is supported. e.g.
un décimo
->1/10
- Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken/dictated punctuation symbols for the keyword strings.
punto
(.
),interrogación
(?
),exclamación
(!
),punto y coma
(;
),guion medio
(-
), etc.
Smart formatting Examples for Spanish
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | treinta y uno de diciembre de mil novecientos noventa y dos | 31/12/1992 |
dieciséis de septiembre dos mil dieciocho | 16/09/2018 | |
Times | las dieciséis cincuenta | las 16:50 |
las dos a eme | las 2:00 a.m. | |
Numbers | mil novecientos cincuenta y ocho | 1958 |
once mil novecientos cincuenta y ocho | 11958 | |
décima primera | 11ª | |
un cuarentiunavo | 1/41 | |
Phone numbers | nueve uno cuatro cinco cinco seis ocho tres tres uno | 914 556 8331 |
uno dos tres cuatro cinco seis siete ocho | 1234 5678 | |
Currency values | dos euros noventa centavos | €2,90 |
doce euros y cinco centavos | €12,05 | |
nueve punto cinco millones de pesos | $9.5 millones | |
Email, URL | a b c arroba g mail punto a b c | abc@gmail.abc |
doble uve doble uve doble uve punto nvidia punto com | www.nvidia.com | |
Measures | tres metros cúbicos | 3 m³ |
dos kilómetros por hora | 2 kph | |
Sequences | cero dos tres seis dos cuatro | 023624 |
r t x cero dos tres w | rtx023w |
German
- Date formatting has support for both numeric and name for months (for example,
zweiter
is same asfebruar
). The dates are formatted asDD.MM.YYYY
. - Times are identified by keywords, for example,
nach
uhr
,vor
,minuten
. Time is formatted as 24-hr clock :HH:MM:SS
. - Telephone numbers should have 3-4 digit area code starting with
0
followed by 8-digit number. Country code (+49) is optional. Area code should not start with0
if country code is used. Numbers are formatted as+49 [N]NN NNNNNNNN
or0[N]NN NNNNNNNN
. - Most Currency symbols are substituted for strings in appropriate contexts, for example,
dollar
,cent
,euro
,yen
. - Internet email addresses with common format ( for example,
[alphanumeric+symbols]+ ät [alphanumeric punkt]+ domainname
) are formatted. - Web URLs, both short and long form, are formatted. It includes protocol (
http/s
), subdomain (www
), ports (443,80
) and paths (/help/abc
) - Cardinals less than nine are not converted in order to avoid odd or ambiguous conversions.
- Ordinals and fractions formatting are supported.
- Most of the punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes spoken/dictated punctuation symbols for the keyword strings.
komma
(,
),punkt
(.
),fragezeichen
(?
),ausrufezeichen
(!
),semikolon
(;
),bindestrich
(-
).
Smart formatting Examples for German
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Entity Type | Without smart formatting | With smart formatting |
---|---|---|
Dates | vierundzwanzigster juli zwei tausend dreizehn | 24.07.2013 |
dreizehnter zweiter zwei tausend zwanzig | 13.02.2020 | |
Times | vierundzwanziguhrzweiundzwanzig | 24:22 Uhr |
acht uhr sieben | 08:07 Uhr | |
ein uhr eine minute eine sekunde | 01:01:01 Uhr | |
Numbers | minus fünf und zwanzig tausend sieben und dreißig | -25037 |
acht hundert achtzehn komma drei null drei | 818,303 | |
fünfundzwanzigtausendeinhundertelftem |
|
|
drei zwei ein hundertstel | 3 2/100 | |
Phone numbers | null vier eins eins eins zwei drei vier eins zwei drei vier | 0411 12341234 |
plus vier neun vier eins eins eins zwei drei vier eins zwei drei vier | +49 411 12341234 | |
Currency values | zwei komma null null null eins dollar | 2,0001 $ |
zweiundzwanzig cent | 0,22 € | |
Email, URL, IP | a b drei bindestrich s d d bindestrich drei ät g mail punkt com | ab3-sdd-3@gmail.com |
h t t p s doppelpunkt slash slash w w w punkt a b c punkt com slash a b | https://www.abc.com/ab | |
drei fünf punkt eins drei fünf punkt zwei vier punkt zwei vier | 35.135.24.24 | |
Measures | zwei kilometer pro stunde | 2 km/h |
vier hundert vierzig milliliter | 440 ml | |
Sequences | c b vier drei bindestrich fünf drei fünf zwei vier zwei punkt vier drei fünf | cb43-535242.435 |
teilenummer f t strich zwölf p | teilenummer ft-12p |
Smart formatting V2 examples
The following example requests smart formatting with a recognition request by setting the smart_formatting
parameter to true
. The following sections show the effects of smart formatting on the results of a request.
IBM Cloud
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?model=en-US_Telephony&smart_formatting=true&smart_formatting_version=2"
IBM Cloud Pak for Data
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?model=en-US_Telephony&smart_formatting=true&smart_formatting_version=2"
Smart formatting
The smart formatting feature is beta functionality that is available for US English, Japanese, and Spanish (all dialects). It also also available for the en-WW_Medical_Telephony
model when US English audio is recognized.
The smart_formatting
parameter directs the service to convert the following strings into more conventional representations:
- Dates
- Times
- Series of digits and numbers
- Phone numbers
- Currency values (for US English and Spanish)
- Internet email and web addresses (for US English and Spanish)
You set the smart_formatting
parameter to true
to enable smart formatting. By default, the service does not perform smart formatting. The service applies smart formatting just before it returns the final results to
the client, when text normalization is complete. The conversion makes the transcript more readable and promotes better post-processing of transcription results by representing these artifacts as they would normally be written.
What results does smart formatting affect?
Smart formatting impacts some transcription results and not others:
-
Smart formatting affects only words in the
transcript
field of final results, those results for which thefinal
field istrue
. It does not affect interim results, for whichfinal
isfalse
. -
Smart formatting does not affect words in other fields of the response. For example, smart formatting is not applied to response data in the
timestamps
oralternatives
fields. -
Speech hesitations, such as "uhm" and "uh", can adversely impact the conversion of phrases and strings by smart formatting for some languages. Previous-generation models produce hesitation markers to replace such hesitations in a transcript. Smart formatting has the following effect on hesitation markers for previous-generation models:
- For US English, smart formatting suppresses hesitation markers from the
transcript
field for final results. - For Japanese, hesitation markers continue to appear in final results.
- For both US English and Japanese, hesitation markers continue to appear in interim results.
- For Spanish, the service does not produce hesitation markers for any results.
Next-generation models do not produce hesitation markers. They instead include the actual hesitations in transcription results. Smart formatting has no effect on hesitations that are included by next-generation models. For more information, see Speech hesitations and hesitation markers.
- For US English, smart formatting suppresses hesitation markers from the
Language differences
Smart formatting is based on the presence of obvious keywords in the transcript. Because of differences among the supported languages, smart formatting works slightly differently for each language. The following sections describe the strings and content that trigger smart formatting changes for US English and Spanish and for Japanese.
US English and Spanish
-
Times are identified by keywords such as
AM
,PM
, orEST
. -
Military times are converted if they are identified by the keyword
hours
(US English) orhoras
(Spanish). -
Phone numbers must be either
911
or a number that contains 10 or 11 digits and starts with the number1
. -
Currency symbols are substituted for the following strings in appropriate contexts:
- For US English, dollar, cent, and euro.
- For Spanish, dolar, peso, peseta, libras esterlinas, libra, and euro.
-
Internet email addresses are converted in some cases. Specifically, the service converts email addresses if the input audio uses the phrasing
email address ... {address}
. The following examples show a correct conversion of spoken phrases:My email address is j dot d o e at i b m dot com
becomesMy email address is j.doe@ibm.com
.Mi correo electronico es j punto d o e arroba i b m punto com
becomesMi correo electronico es j.doe@ibm.com
.
-
Internet web addresses are converted in their short forms. Fully qualified web addresses are not converted. The following examples show complete conversions:
I saw the story on yahoo dot com
becomesI saw the story on yahoo.com
.Vi la historia en yahoo punto com
becomesVi la historia en yahoo.com
.
The following examples show incomplete conversions:
I saw the story on w w w dot yahoo dot com
becomesI saw the story on w w w .yahoo.com
.Vi la historia en w w w punto yahoo punto com
becomesVi la historia en w w w .yahoo.com
.
-
Conversion of large numbers and currency values can be challenging. The service converts digits and many numbers well. But larger and more complex numbers and currency values work best with more precise phrasing. For example, the service correctly converts the following transcripts because of their precise wording:
sixty nine thousand five hundred sixty dollars and twenty five cents
becomes$69560.25
,sixty nine thousand five hundred sixty dollars point twenty five
becomes$69560.25
.
But the service cannot correctly convert the following transcripts due to their looser phrasing:
sixty nine thousand five sixty dollars and twenty five cents
becomes60 9000 $560.25
.sixty nine thousand five sixty dollars point twenty five
becomes60 9000 $560.25
.
To correctly convert a greater possible variety of complex numbers, you need to experiment with the results of smart formatting and customize your own post-processing utilities.
-
For US English, certain punctuation symbols are added for special keywords that occur in appropriate places. When you use smart formatting, the service substitutes punctuation symbols for the following keyword strings based on where it finds them in a transcript:
Comma
(,
)Period
(.
)Question mark
(?
)Exclamation point
(!
)
The service converts these keyword strings to symbols only in appropriate positions of a transcript. In the following example, the speaker says the word
period
at the end of the sentence:the warranty period is short period
becomesthe warranty period is short.
The service correctly differentiates between the noun that appears earlier in the sentence and the concluding punctuation.
Japanese
-
Phone numbers must be 10 or 11 digits and begin with valid prefixes for telephone numbers in Japan. For example, valid prefixes include
03
and090
. -
English words are converted to ASCII (hankaku) characters. For example,
IBM
is converted toIBM
. -
Ambiguous terms might not be converted if sufficient context is unavailable. For example, it is unclear whether
一時
and十分
refer to times. -
Punctuation is handled in the same way with or without smart formatting. For example, based on probability calculations, one of
カンマ
or,
is selected. -
Strings that describe yen values are not replaced with the yen currency symbol.
-
Internet email and web addresses in any form are not converted.
-
The Japanese narrowband model (
ja-JP_NarrowbandModel
) includes some multigram word units for digits and decimal fractions. The service returns these multigram units regardless of whether you enable smart formatting. The following examples show the units that the service returns. The numbers in parentheses show the equivalent Arabic numeric expression for each unit.- Digits:
〇一
(01), ...,〇九
(09),一〇
(10), ...,九〇
(90) - Decimal fractions:
〇・
(0.),一・
(1.), ...,十・
(10.)
The smart formatting feature understands and returns the multigram units that the model generates. If you apply your own post-processing to transcription results, you need to handle these units appropriately.
- Digits:
Smart formatting results
The following table shows examples of final transcripts both with and without smart formatting. The transcripts are based on US English audio.
Information | Without smart formatting | With smart formatting |
---|---|---|
Dates | I was born on ten oh six nineteen seventy | I was born on 10/6/1970 |
I was born on the ninth of December nineteen hundred | I was born on 12/9/1900 | |
Today is June sixth | Today is June 6 | |
Times | The meeting starts at nine thirty AM | The meeting starts at 9:30 AM |
I am available at seven EST | I am available at 7:00 EST | |
We meet at oh seven hundred hours | We meet at 0700 hours | |
Numbers | The quantity is one million one hundred and one | The quantity is 1000101 |
One point five is between one and two | 1.5 is between 1 and 2 | |
Phone numbers | Call me at nine one four two three seven one thousand | Call me at 914-237-1000 |
Call me at one nine one four nine oh nine twenty six forty five | Call me at 1-914-909-2645 | |
Currency values | You owe me three thousand two hundred two dollars and sixty six | You owe me $3202.66 |
The dollar rose to one hundred and nine point seven nine yen from one hundred and nine point seven two yen | The dollar rose to 109.79 yen from 109.72 yen | |
Internet email and web addresses | My email address is john dot doe at foo dot com | My email address is john.doe@foo.com |
I saw the story on yahoo dot com | I saw the story on yahoo.com | |
Combinations | The code is zero two four eight one and the date of service is May fifth two thousand and one | The code is 02481 and the date of service is 5/5/2001 |
There are forty seven links on Yahoo dot com now | There are 47 links on Yahoo.com now |
Smart formatting results for long pauses
In cases where an utterance contains long enough pauses of silence, the service can split the transcript into two or more final results. This affects the contents of the response, as shown in the following examples.
Audio speech | Formatted transcription results |
---|---|
My phone number is nine one four five five seven three three nine two | "My phone number is 914-557-3392" |
My phone number is nine one four ...pause... five five seven three three nine two | "My phone number is 914" "5573392" |
For more information about specifying a pause interval that affects the service's response, see End of phrase silence time.
Smart formatting example
The following example requests smart formatting with a recognition request by setting the smart_formatting
parameter to true
. The following sections show the effects of smart formatting on the results of a request.
IBM Cloud
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?smart_formatting=true"
IBM Cloud Pak for Data
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?smart_formatting=true"
Numeric redaction
The numeric redaction feature is beta functionality that is available for US English, Japanese, and Korean.
The redaction
parameter directs the service to redact, or mask, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X
character.
It is intended to redact sensitive numeric data, such as credit card numbers.
By default, the service does not redact numeric data. Set the redaction
parameter to true
to enable numeric redaction. When you enable redaction, the service automatically enables smart formatting by setting the smart_formatting
parameter to true
, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables the following parameters when you enable redaction:
- The service disables keyword spotting, regardless of whether you specify values for the
keywords
andkeywords_threshold
parameters. - The service disables maximum alternatives, regardless of whether you specify a value greater than 1 for the
max_alternatives
parameter. The service returns only a single final transcript. - The service disables interim results for the WebSocket interface, regardless of whether you set the
interim_results
parameter totrue
.
The design of the feature parallels the existing smart formatting feature. The service applies redaction only to the final transcript of a recognition request, just before it returns the results to the client and after text normalization is complete.
Language differences
The feature works exactly as described for US English models but has the following differences for Japanese and Korean models.
Japanese
Japanese redaction has the following differences:
-
In addition to masking strings of three or more consecutive digits, redaction also masks street addresses and numbers, regardless of whether they contain fewer than three digits.
-
Similarly, redaction also masks date information in Japanese-style birth dates. In Japanese, date information is usually presented in Common Era format but sometimes follows Japanese style, particularly for birth dates. In this case, the year and month are masked even though they contain just one or two digits.
For example, a Japanese-style birth date without redaction is
平成 30年 2月
. With redaction, the date becomes平成 XX年 X月
.
Korean
Korean redaction has the following differences:
-
The smart formatting feature is not supported. The service still performs numeric redaction for Korean, but it performs no other smart formatting.
-
Isolated digit characters are reduced, but possible digit characters that are included as part of Korean phrases are not. For example, the character
이
in the following phrase is not replaced by anX
because it is adjacent to the following character:이입니다
If the
이
character were separated from the following character by a space, it would be replaced by anX
, as described in Numeric redaction results.
Numeric redaction results
The following table shows examples of final transcripts both with and without numeric redaction in each supported language.
Language | Without redaction | With redaction |
---|---|---|
US English | my credit card number is four one four seven two | my credit card number is XXXXX |
Japanese | 私 のクレジット カード 番号 は 四 一 四 七 二です | 私 のクレジット カード 番号 は XXXXX です |
Korean | 내 신용 카드 번호는 사 일 사 칠 이 번입니다 | 내 신용 카드 번호는 XXXXX 번입니다 |
Numeric redaction example
The following example requests numeric redaction with a recognition request by setting the redaction
parameter to true
. Because the request enables redaction, the service implicitly enables smart formatting with the
request. The service effectively disables the other parameters of the request so that they have no effect: The service returns a single final transcript and recognizes no keywords.
IBM Cloud
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/wav" \
--data-binary @{path}audio-file.wav \
"{url}/v1/recognize?&redaction=true&max_alternatives=3&keywords=birth%2Cbirthday&keywords_threshold=0.5"
IBM Cloud Pak for Data
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/wav" \
--data-binary @{path}audio-file.wav \
"{url}/v1/recognize?&redaction=true&max_alternatives=3&keywords=birth%2Cbirthday&keywords_threshold=0.5"
Profanity filtering
The profanity filtering feature is generally available for US English and Japanese only.
The profanity_filter
parameter indicates whether the service is to censor profanity from its results. By default, the service obscures all profanity by replacing it with a series of asterisks in the transcript. Setting the parameter
to false
displays words in the output exactly as transcribed.
The service censors profanity from all final transcripts and from any alternative transcripts. It also censors profanity from results that are associated with word alternatives, word confidence, and word timestamps. The sole exception is keyword
spotting, for which the service returns all words as specified by the user, regardless of whether profanity_filter
is true
.
Profanity filtering example
The following example shows the results for a brief audio file that is transcribed with the default true
value for the profanity_filter
parameter. The request also sets the word_alternatives_threshold
parameter to a relatively high value of 0.99
and the word_confidence
and timestamps
parameters to true
.
IBM Cloud
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?word_alternatives_threshold=0.99&word_confidence=true×tamps=true"
IBM Cloud Pak for Data
curl -X POST \
--header "Authorization: Bearer {token}" \
--header "Content-Type: audio/flac" \
--data-binary @{path}audio-file.flac \
"{url}/v1/recognize?word_alternatives_threshold=0.99&word_confidence=true×tamps=true"
The service masks profanity from the response by replacing it with a series of asterisks:
{
"result_index": 0,
"results": [
{
"word_alternatives": [
{
"start_time": 0.03,
"alternatives": [
{
"confidence": 1.0,
"word": "****"
}
],
"end_time": 0.25
},
{
"start_time": 0.25,
"alternatives": [
{
"confidence": 0.99,
"word": "you"
}
],
"end_time": 0.56
}
],
"alternatives": [
{
"transcript": "**** you",
"confidence": 0.99,
"word_confidence": [
["****", 1.0],
["you", 0.99]
],
"timestamps": [
["****", 0.03, 0.25],
["you", 0.25, 0.56]
]
}
],
"final": true
}
]
}