Speaker intonation adaptation for transforming text-to-speech synthesis speaker identity

Mahsa Sadat Elyasi Langarani, Jan Van Santen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

In this study, we propose a new intonation adaptation method to transform the perceived identity of a Text-To-Speech system to that of a target speaker with a small amount of training data. In the proposed method, during training we fit parametrized accent and phrase curves to parallel recordings of the target speaker F0 curves, and estimate the parameters of a mapping between the corresponding parameter spaces. During test, we fit the accent and phrase curves to the source utterances, apply the mapping, and create an F0 contour from the mapped accent and phrase curves. We compare the proposed method with a baseline adaptation method in which the source F0 contour is transformed linearly such that the per-utterance mean and variance of the target F0 contour is left unaltered. Perceptual tests showed that the proposed method was better than the baseline method in two subjective tests that assess similarity to the target speaker and speech quality, respectively.

Original languageEnglish (US)
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages116-123
Number of pages8
ISBN (Print)9781479972913
DOIs
StatePublished - Feb 10 2016
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: Dec 13 2015Dec 17 2015

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
CountryUnited States
CityScottsdale
Period12/13/1512/17/15

Keywords

  • Adaptation
  • Intonation modeling
  • Prosody
  • Text-to-Speech synthesis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Speaker intonation adaptation for transforming text-to-speech synthesis speaker identity'. Together they form a unique fingerprint.

  • Cite this

    Langarani, M. S. E., & Van Santen, J. (2016). Speaker intonation adaptation for transforming text-to-speech synthesis speaker identity. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings (pp. 116-123). [7404783] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2015.7404783