Spectral voice conversion for text-to-speech synthesis

A. Kain, M. W. MacOn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

469 Scopus citations

Abstract

A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speakers average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data. However, speech quality improved with increases in the training set size.

Original languageEnglish (US)
Title of host publicationProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages285-288
Number of pages4
ISBN (Print)0780344286, 9780780344280
DOIs
StatePublished - Jan 1 1998
Event1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 - Seattle, WA, United States
Duration: May 12 1998May 15 1998

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Conference

Conference1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
CountryUnited States
CitySeattle, WA
Period5/12/985/15/98

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Spectral voice conversion for text-to-speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Kain, A., & MacOn, M. W. (1998). Spectral voice conversion for text-to-speech synthesis. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 (pp. 285-288). [674423] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 1). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.1998.674423