Control and prediction of the impact of pitch modification on synthetic speech quality

Esther Klabbers, Jan Van Santen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In order to use speech synthesis to generate highly expressive speech convincingly, the problem of poor prosody (both prediction and generation) needs to be overcome. In this paper we will show that with a simple annotation scheme using the notion of foot structure, we can more accurately predict the shape of local pitch contours. The assumption is that with a better selection mechanism we can reduce the amount of pitch modification required, thereby reducing speech degradation. In addition, we present a perceptual experiment that investigates the degradation introduced by pitch modification using the OGIresLPC algorithm. We correlated the weighted perceptual score with different pitch and delta pitch distances. The best combination of distance measures is able to explain 63% of the variance in the perceptual scores. Decreasing the pitch is shown to have a higher impact on perception than increasing the pitch.

Original languageEnglish (US)
Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages317-320
Number of pages4
StatePublished - 2003
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period9/1/039/4/03

Fingerprint

Degradation
Speech synthesis
Experiments
experiment

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Cite this

Klabbers, E., & Van Santen, J. (2003). Control and prediction of the impact of pitch modification on synthetic speech quality. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 317-320). International Speech Communication Association.

Control and prediction of the impact of pitch modification on synthetic speech quality. / Klabbers, Esther; Van Santen, Jan.

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. p. 317-320.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Klabbers, E & Van Santen, J 2003, Control and prediction of the impact of pitch modification on synthetic speech quality. in EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 317-320, 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, 9/1/03.
Klabbers E, Van Santen J. Control and prediction of the impact of pitch modification on synthetic speech quality. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association. 2003. p. 317-320
Klabbers, Esther ; Van Santen, Jan. / Control and prediction of the impact of pitch modification on synthetic speech quality. EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. pp. 317-320
@inproceedings{6091fa6340814032b603be8331d62565,
title = "Control and prediction of the impact of pitch modification on synthetic speech quality",
abstract = "In order to use speech synthesis to generate highly expressive speech convincingly, the problem of poor prosody (both prediction and generation) needs to be overcome. In this paper we will show that with a simple annotation scheme using the notion of foot structure, we can more accurately predict the shape of local pitch contours. The assumption is that with a better selection mechanism we can reduce the amount of pitch modification required, thereby reducing speech degradation. In addition, we present a perceptual experiment that investigates the degradation introduced by pitch modification using the OGIresLPC algorithm. We correlated the weighted perceptual score with different pitch and delta pitch distances. The best combination of distance measures is able to explain 63{\%} of the variance in the perceptual scores. Decreasing the pitch is shown to have a higher impact on perception than increasing the pitch.",
author = "Esther Klabbers and {Van Santen}, Jan",
year = "2003",
language = "English (US)",
pages = "317--320",
booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",

}

TY - GEN

T1 - Control and prediction of the impact of pitch modification on synthetic speech quality

AU - Klabbers, Esther

AU - Van Santen, Jan

PY - 2003

Y1 - 2003

N2 - In order to use speech synthesis to generate highly expressive speech convincingly, the problem of poor prosody (both prediction and generation) needs to be overcome. In this paper we will show that with a simple annotation scheme using the notion of foot structure, we can more accurately predict the shape of local pitch contours. The assumption is that with a better selection mechanism we can reduce the amount of pitch modification required, thereby reducing speech degradation. In addition, we present a perceptual experiment that investigates the degradation introduced by pitch modification using the OGIresLPC algorithm. We correlated the weighted perceptual score with different pitch and delta pitch distances. The best combination of distance measures is able to explain 63% of the variance in the perceptual scores. Decreasing the pitch is shown to have a higher impact on perception than increasing the pitch.

AB - In order to use speech synthesis to generate highly expressive speech convincingly, the problem of poor prosody (both prediction and generation) needs to be overcome. In this paper we will show that with a simple annotation scheme using the notion of foot structure, we can more accurately predict the shape of local pitch contours. The assumption is that with a better selection mechanism we can reduce the amount of pitch modification required, thereby reducing speech degradation. In addition, we present a perceptual experiment that investigates the degradation introduced by pitch modification using the OGIresLPC algorithm. We correlated the weighted perceptual score with different pitch and delta pitch distances. The best combination of distance measures is able to explain 63% of the variance in the perceptual scores. Decreasing the pitch is shown to have a higher impact on perception than increasing the pitch.

UR - http://www.scopus.com/inward/record.url?scp=84947205078&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947205078&partnerID=8YFLogxK

M3 - Conference contribution

SP - 317

EP - 320

BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

PB - International Speech Communication Association

ER -