Evaluation of speaker mimic technology for personalizing SGD voices

Esther Klabbers, Alexander Kain, Jan Van Santen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use high-quality acoustic inventories from professional speakers and transform them to a different target speaker using a very limited set of sentences from that speaker. This technology targets future SGD users who still have a limited vocabulary or available previous recordings. The results of a perceptual study show that listeners can identify which SM voices most resemble their respective target voices. 1

Original languageEnglish (US)
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Pages2154-2157
Number of pages4
StatePublished - 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: Sep 26 2010Sep 30 2010

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period9/26/109/30/10

Fingerprint

Technology
Equipment and Supplies
Vocabulary
Acoustics
Evaluation
Mimicry

Keywords

  • Perceptual evaluation
  • Prosody modeling
  • Speech synthesis
  • Voice transformation

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Klabbers, E., Kain, A., & Van Santen, J. (2010). Evaluation of speaker mimic technology for personalizing SGD voices. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 2154-2157)

Evaluation of speaker mimic technology for personalizing SGD voices. / Klabbers, Esther; Kain, Alexander; Van Santen, Jan.

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 2154-2157.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Klabbers, E, Kain, A & Van Santen, J 2010, Evaluation of speaker mimic technology for personalizing SGD voices. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. pp. 2154-2157, 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan, 9/26/10.
Klabbers E, Kain A, Van Santen J. Evaluation of speaker mimic technology for personalizing SGD voices. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 2154-2157
Klabbers, Esther ; Kain, Alexander ; Van Santen, Jan. / Evaluation of speaker mimic technology for personalizing SGD voices. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. pp. 2154-2157
@inproceedings{42344b9ec6574f0796879c01f4cf98fe,
title = "Evaluation of speaker mimic technology for personalizing SGD voices",
abstract = "In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use high-quality acoustic inventories from professional speakers and transform them to a different target speaker using a very limited set of sentences from that speaker. This technology targets future SGD users who still have a limited vocabulary or available previous recordings. The results of a perceptual study show that listeners can identify which SM voices most resemble their respective target voices. 1",
keywords = "Perceptual evaluation, Prosody modeling, Speech synthesis, Voice transformation",
author = "Esther Klabbers and Alexander Kain and {Van Santen}, Jan",
year = "2010",
language = "English (US)",
pages = "2154--2157",
booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

TY - GEN

T1 - Evaluation of speaker mimic technology for personalizing SGD voices

AU - Klabbers, Esther

AU - Kain, Alexander

AU - Van Santen, Jan

PY - 2010

Y1 - 2010

N2 - In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use high-quality acoustic inventories from professional speakers and transform them to a different target speaker using a very limited set of sentences from that speaker. This technology targets future SGD users who still have a limited vocabulary or available previous recordings. The results of a perceptual study show that listeners can identify which SM voices most resemble their respective target voices. 1

AB - In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use high-quality acoustic inventories from professional speakers and transform them to a different target speaker using a very limited set of sentences from that speaker. This technology targets future SGD users who still have a limited vocabulary or available previous recordings. The results of a perceptual study show that listeners can identify which SM voices most resemble their respective target voices. 1

KW - Perceptual evaluation

KW - Prosody modeling

KW - Speech synthesis

KW - Voice transformation

UR - http://www.scopus.com/inward/record.url?scp=79959840813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959840813&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2154

EP - 2157

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ER -