Research references

For more information about the research behind the IBM Watson® Text to Speech service, see the following documents. IBM® researchers wrote or contributed to all of these papers.

Eide, Ellen M., and Raul Fernandez. Database Mining for Flexible Concatenative Text-to-Speech. Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 4 (2007): pp. 697-700.
Eide, Ellen, Raul Fernandez, Ron Hoory, Wael Hamza, Zvi Kons, Michael Picheny, Ariel Sagi, Slava Shechtman, and Zhi Wei Shuang. The IBM Submitted to the 2006 Blizzard Text-to-Speech Challenge. Blizzard Challenge Workshop 2006.
Fernandez, Raul, David Daws, Guy Lorberdam, Slava Shechtman, and Alexander Sorin. Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis. Proceedings Interspeech (2022): publication pending.
Fernandez, Raul, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory. Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System. Proceedings Interspeech (2015), pp. 1606-1610.
Fernandez, Raul, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory. Prosody Contour Prediction with Long Short-Term Memory, Bi-directional, Deep Recurrent Neural Networks. Proceedings Interspeech (2014), pp. 2268-2272.
Fernandez, Raul, Zvi Kons, Slava Shechtman, Zhi Wei Shuang, Ron Hoory, Bhuvana Ramabhadran, and Yong Qin. The IBM Submitted to the 2008 Text-to-Speech Blizzard Challenge. Blizzard Challenge Workshop 2008.
Fernandez, Raul, and Bhuvana Ramabhadran. Automatic Exploration of Corpus-Specific Properties for Expressive Text-to-Speech: A Case Study in Emphasis. Proceedings of the Sixth ISCA Workshop on Speech Synthesis (August 2007): pp. 34-39.
Fernandez, Raul, Raimo Bakis, Ellen Eide, Wael Hamza, John Pitrelli, and Michael A. Picheny. The 2006 TC-STAR Evaluation of the IBM Expressive Text-to-Speech Synthesis System. Speech-to-Speech Translation Workshop, Barcelona, Spain (2006), pp. 175-180.
Kons, Zvi, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, and Ron Hoory. High quality, lightweight and adaptable TTS using LPCNet. Submitted to Interspeech (2019).
Pitrelli, John F., Raimo Bakis, Ellen M. Eide, Raul Fernandez, Wael Hamza, and Michael A. Picheny. The IBM Expressive Text-to-Speech Synthesis System for American English. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14(4) (July 2006): pp. 1099-1108.
Rendel, Asaf, Raul Fernandez, Ron Hoory, and Bhuvana Ramabhadran. Using Continuous Lexical Embeddings to Improve Symbolic-Prosody Prediction in a Text-to-Speech Front End. Proceedings ICASSP (2016), pp. 5655-5659.
Shechtman, Slava. Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System. Proceedings of the Sixth ISCA Workshop on Speech Synthesis (August 2007): pp. 234-239.
Shuang, Zhi-Wei, Raimo Bakis, Slava Shechtman, Dan Chazan, and Yong Qin. Frequency warping based on mapping formant parameters. Proceedings of the Ninth International Conference on Spoken Language Processing (ICSLP), Interspeech (2006): pp. 2290-2293.