Automated vocal emotion recognition using phoneme class specific features

Géza Kiss; Jan Van Santen

Automated vocal emotion recognition using phoneme class specific features

Géza Kiss, Jan Van Santen

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Methods for automated vocal emotion recognition often use acoustic feature vectors that are computed for each frame in an utterance, and global statistics based on these acoustic feature vectors. However, at least two considerations argue for usage of phoneme class specific features for emotion recognition. First, there are well-known effects of phoneme class on some of these features. Second, it is plausible that emotion influences the speech signal in ways that differ between phoneme classes. A new method based on the concept of phoneme class specific features is proposed in which different features are selected for regions associated with different phoneme classes and then optimally combined, using machine learning algorithms. A small but significant improvement was found when this method was compared with an otherwise identical method in which features were used uniformly over different phoneme classes.

Original language	English (US)
Title of host publication	Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Publisher	International Speech Communication Association
Pages	1161-1164
Number of pages	4
State	Published - 2010

Publication series

Name	Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

Biomedical application
Emotion recognition
Phoneme class specific features

ASJC Scopus subject areas

Software
Signal Processing
Speech and Hearing
Language and Linguistics
Human-Computer Interaction
Modeling and Simulation

Cite this

Kiss, G., & Van Santen, J. (2010). Automated vocal emotion recognition using phoneme class specific features. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 1161-1164). (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010). International Speech Communication Association.

Automated vocal emotion recognition using phoneme class specific features. / Kiss, Géza; Van Santen, Jan.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association, 2010. p. 1161-1164 (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kiss, G & Van Santen, J 2010, Automated vocal emotion recognition using phoneme class specific features. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, International Speech Communication Association, pp. 1161-1164.

Kiss G, Van Santen J. Automated vocal emotion recognition using phoneme class specific features. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association. 2010. p. 1161-1164. (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

Kiss, Géza ; Van Santen, Jan. / Automated vocal emotion recognition using phoneme class specific features. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. International Speech Communication Association, 2010. pp. 1161-1164 (Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010).

@inproceedings{fe3198c7896b4a4dac744486cf0cd743,

title = "Automated vocal emotion recognition using phoneme class specific features",

abstract = "Methods for automated vocal emotion recognition often use acoustic feature vectors that are computed for each frame in an utterance, and global statistics based on these acoustic feature vectors. However, at least two considerations argue for usage of phoneme class specific features for emotion recognition. First, there are well-known effects of phoneme class on some of these features. Second, it is plausible that emotion influences the speech signal in ways that differ between phoneme classes. A new method based on the concept of phoneme class specific features is proposed in which different features are selected for regions associated with different phoneme classes and then optimally combined, using machine learning algorithms. A small but significant improvement was found when this method was compared with an otherwise identical method in which features were used uniformly over different phoneme classes.",

keywords = "Biomedical application, Emotion recognition, Phoneme class specific features",

author = "G{\'e}za Kiss and {Van Santen}, Jan",

note = "Funding Information: We thank Esther Klabbers for her help in working with the Actor corpus and for her remarks on an earlier version of this paper; and Paul Hosom for his advice in the use of the CSLU forced alignment system. G{\'e}za Kiss would like to thank the Fulbright Fellowship for providing a scholarship enabling him to partake in this research. This research was also supported by grants from the National Institute on Deafness and Other Communication Disorders, 1R21DC010239 (Lois Black, PI) and the National Science Foundation, 0905095 (Jan van Santen, PI). The views herein are those of the authors and do not necessarily reflect the views of either the above individuals or the funding agencies.",

year = "2010",

language = "English (US)",

series = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

publisher = "International Speech Communication Association",

pages = "1161--1164",

booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

TY - GEN

T1 - Automated vocal emotion recognition using phoneme class specific features

AU - Kiss, Géza

AU - Van Santen, Jan

N1 - Funding Information: We thank Esther Klabbers for her help in working with the Actor corpus and for her remarks on an earlier version of this paper; and Paul Hosom for his advice in the use of the CSLU forced alignment system. Géza Kiss would like to thank the Fulbright Fellowship for providing a scholarship enabling him to partake in this research. This research was also supported by grants from the National Institute on Deafness and Other Communication Disorders, 1R21DC010239 (Lois Black, PI) and the National Science Foundation, 0905095 (Jan van Santen, PI). The views herein are those of the authors and do not necessarily reflect the views of either the above individuals or the funding agencies.

PY - 2010

Y1 - 2010

N2 - Methods for automated vocal emotion recognition often use acoustic feature vectors that are computed for each frame in an utterance, and global statistics based on these acoustic feature vectors. However, at least two considerations argue for usage of phoneme class specific features for emotion recognition. First, there are well-known effects of phoneme class on some of these features. Second, it is plausible that emotion influences the speech signal in ways that differ between phoneme classes. A new method based on the concept of phoneme class specific features is proposed in which different features are selected for regions associated with different phoneme classes and then optimally combined, using machine learning algorithms. A small but significant improvement was found when this method was compared with an otherwise identical method in which features were used uniformly over different phoneme classes.

AB - Methods for automated vocal emotion recognition often use acoustic feature vectors that are computed for each frame in an utterance, and global statistics based on these acoustic feature vectors. However, at least two considerations argue for usage of phoneme class specific features for emotion recognition. First, there are well-known effects of phoneme class on some of these features. Second, it is plausible that emotion influences the speech signal in ways that differ between phoneme classes. A new method based on the concept of phoneme class specific features is proposed in which different features are selected for regions associated with different phoneme classes and then optimally combined, using machine learning algorithms. A small but significant improvement was found when this method was compared with an otherwise identical method in which features were used uniformly over different phoneme classes.

KW - Biomedical application

KW - Emotion recognition

KW - Phoneme class specific features

UR - http://www.scopus.com/inward/record.url?scp=79959815825&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959815825&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:79959815825

T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

SP - 1161

EP - 1164

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

PB - International Speech Communication Association

ER -

Automated vocal emotion recognition using phoneme class specific features

Abstract

Publication series

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this