F0 range and peak alignment across speakers and emotions

Eric Morley, Jan Van Santen, Esther Klabbers, Alexander Kain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We present an analysis of F0 range and peak alignment in emotional speech from a heterogeneous group of speakers varying in age and gender. Both speaker and emotion had a strong effect on F0 range. Despite these large changes in the F0 trajectory, peak alignment was remarkably stable. Using the Linear Alignment Model (LAM) [1], we show that the effects on alignment of emotion and speaker differences, although statistically significant, are small. This stability results in a conclusion that peak alignment, unlike F0 range, does not appear to carry much information about speaker identity or emotional state. The LAM is effective in that it explains 42% of the variance in peak location on average, and furthermore it predicts the time of F0 peaks with an average RMS error of 12ms.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages4952-4955
Number of pages4
DOIs
StatePublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: May 22 2011May 27 2011

Other

Other36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period5/22/115/27/11

Fingerprint

Trajectories

Keywords

  • emotion recognition
  • human voice
  • speech analysis
  • speech synthesis

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Morley, E., Van Santen, J., Klabbers, E., & Kain, A. (2011). F0 range and peak alignment across speakers and emotions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 4952-4955). [5947467] https://doi.org/10.1109/ICASSP.2011.5947467

F0 range and peak alignment across speakers and emotions. / Morley, Eric; Van Santen, Jan; Klabbers, Esther; Kain, Alexander.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. p. 4952-4955 5947467.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morley, E, Van Santen, J, Klabbers, E & Kain, A 2011, F0 range and peak alignment across speakers and emotions. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 5947467, pp. 4952-4955, 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, Czech Republic, 5/22/11. https://doi.org/10.1109/ICASSP.2011.5947467
Morley E, Van Santen J, Klabbers E, Kain A. F0 range and peak alignment across speakers and emotions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. p. 4952-4955. 5947467 https://doi.org/10.1109/ICASSP.2011.5947467
Morley, Eric ; Van Santen, Jan ; Klabbers, Esther ; Kain, Alexander. / F0 range and peak alignment across speakers and emotions. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2011. pp. 4952-4955
@inproceedings{611087d4aa714a5cb78658e94095496a,
title = "F0 range and peak alignment across speakers and emotions",
abstract = "We present an analysis of F0 range and peak alignment in emotional speech from a heterogeneous group of speakers varying in age and gender. Both speaker and emotion had a strong effect on F0 range. Despite these large changes in the F0 trajectory, peak alignment was remarkably stable. Using the Linear Alignment Model (LAM) [1], we show that the effects on alignment of emotion and speaker differences, although statistically significant, are small. This stability results in a conclusion that peak alignment, unlike F0 range, does not appear to carry much information about speaker identity or emotional state. The LAM is effective in that it explains 42{\%} of the variance in peak location on average, and furthermore it predicts the time of F0 peaks with an average RMS error of 12ms.",
keywords = "emotion recognition, human voice, speech analysis, speech synthesis",
author = "Eric Morley and {Van Santen}, Jan and Esther Klabbers and Alexander Kain",
year = "2011",
doi = "10.1109/ICASSP.2011.5947467",
language = "English (US)",
isbn = "9781457705397",
pages = "4952--4955",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - F0 range and peak alignment across speakers and emotions

AU - Morley, Eric

AU - Van Santen, Jan

AU - Klabbers, Esther

AU - Kain, Alexander

PY - 2011

Y1 - 2011

N2 - We present an analysis of F0 range and peak alignment in emotional speech from a heterogeneous group of speakers varying in age and gender. Both speaker and emotion had a strong effect on F0 range. Despite these large changes in the F0 trajectory, peak alignment was remarkably stable. Using the Linear Alignment Model (LAM) [1], we show that the effects on alignment of emotion and speaker differences, although statistically significant, are small. This stability results in a conclusion that peak alignment, unlike F0 range, does not appear to carry much information about speaker identity or emotional state. The LAM is effective in that it explains 42% of the variance in peak location on average, and furthermore it predicts the time of F0 peaks with an average RMS error of 12ms.

AB - We present an analysis of F0 range and peak alignment in emotional speech from a heterogeneous group of speakers varying in age and gender. Both speaker and emotion had a strong effect on F0 range. Despite these large changes in the F0 trajectory, peak alignment was remarkably stable. Using the Linear Alignment Model (LAM) [1], we show that the effects on alignment of emotion and speaker differences, although statistically significant, are small. This stability results in a conclusion that peak alignment, unlike F0 range, does not appear to carry much information about speaker identity or emotional state. The LAM is effective in that it explains 42% of the variance in peak location on average, and furthermore it predicts the time of F0 peaks with an average RMS error of 12ms.

KW - emotion recognition

KW - human voice

KW - speech analysis

KW - speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=80051615146&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051615146&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2011.5947467

DO - 10.1109/ICASSP.2011.5947467

M3 - Conference contribution

AN - SCOPUS:80051615146

SN - 9781457705397

SP - 4952

EP - 4955

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -