Synthetic F0 can effectively convey speaker ID in delexicalized speech

Eric Morley, Esther Klabbers, Jan Van Santen, Alexander Kain, Seyed Hamidreza Mohammadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

We investigate the extent to which F0 can convey speaker ID in the absence of spectral, segmental, and durational information. We propose two methods of F0 synthesis based on the Linear Alignment Model (LAM) [2]: one parametric, the other corpus-based. Through a perceptual experiment, we show that F0 alone is able to convey information about speaker ID. We find that F0 synthesized with either LAM-based method conveys speaker ID almost as effectively as natural F0.

Original languageEnglish (US)
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages434-437
Number of pages4
Volume1
Publication statusPublished - 2012
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: Sep 9 2012Sep 13 2012

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
CountryUnited States
CityPortland, OR
Period9/9/129/13/12

    Fingerprint

Keywords

  • F
  • Prosody
  • Recombinant synthesis
  • Speaker identity
  • Speech synthesis

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Cite this

Morley, E., Klabbers, E., Van Santen, J., Kain, A., & Mohammadi, S. H. (2012). Synthetic F0 can effectively convey speaker ID in delexicalized speech. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 (Vol. 1, pp. 434-437)