Automated vocal emotion recognition using phoneme class specific features

Géza Kiss, Jan Van Santen

Research output: Contribution to conferencePaper

Abstract

Methods for automated vocal emotion recognition often use acoustic feature vectors that are computed for each frame in an utterance, and global statistics based on these acoustic feature vectors. However, at least two considerations argue for usage of phoneme class specific features for emotion recognition. First, there are well-known effects of phoneme class on some of these features. Second, it is plausible that emotion influences the speech signal in ways that differ between phoneme classes. A new method based on the concept of phoneme class specific features is proposed in which different features are selected for regions associated with different phoneme classes and then optimally combined, using machine learning algorithms. A small but significant improvement was found when this method was compared with an otherwise identical method in which features were used uniformly over different phoneme classes.

Original languageEnglish (US)
Pages1161-1164
Number of pages4
StatePublished - Dec 1 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: Sep 26 2010Sep 30 2010

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CountryJapan
CityMakuhari, Chiba
Period9/26/109/30/10

Keywords

  • Biomedical application
  • Emotion recognition
  • Phoneme class specific features

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Automated vocal emotion recognition using phoneme class specific features'. Together they form a unique fingerprint.

  • Cite this

    Kiss, G., & Van Santen, J. (2010). Automated vocal emotion recognition using phoneme class specific features. 1161-1164. Paper presented at 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, Japan.