Automatic measurement of affective valence and arousal in speech

Meysam Asgari; Geza Kiss; Jan Van Santen; Izhak Shafran; Xubo Song

doi:10.1109/ICASSP.2014.6853740

Automatic measurement of affective valence and arousal in speech

Meysam Asgari, Geza Kiss, Jan Van Santen, Izhak Shafran, Xubo Song

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

Methods are proposed for measuring affective valence and arousal in speech. The methods apply support vector regression to prosodic and text features to predict human valence and arousal ratings of three stimulus types: speech, delexicalized speech, and text transcripts. Text features are extracted from transcripts via a lookup table listing per-word valence and arousal values and computing per-utterance statistics from the per-word values. Prediction of arousal ratings of delexicalized speech and of speech from prosodic features was successful, with accuracy levels not far from limits set by the reliability of the human ratings. Prediction of valence for these stimulus types as well as prediction of both dimensions for text stimuli proved more difficult, even though the corresponding human ratings were as reliable. Text based features did add, however, to the accuracy of prediction of valence for speech stimuli. We conclude that arousal of speech can be measured reliably, but not valence, and that improving the latter requires better lexical features.

Original language	English (US)
Title of host publication	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	965-969
Number of pages	5
ISBN (Print)	9781479928927
DOIs	https://doi.org/10.1109/ICASSP.2014.6853740
State	Published - 2014
Event	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy Duration: May 4 2014 → May 9 2014

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Other

Other	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/Territory	Italy
City	Florence
Period	5/4/14 → 5/9/14

Keywords

affect
arousal
valence

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2014.6853740

Cite this

Asgari, M., Kiss, G., Van Santen, J., Shafran, I., & Song, X. (2014). Automatic measurement of affective valence and arousal in speech. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 (pp. 965-969). Article 6853740 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6853740

Automatic measurement of affective valence and arousal in speech. / Asgari, Meysam; Kiss, Geza; Van Santen, Jan et al.
2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 965-969 6853740 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Asgari, M, Kiss, G, Van Santen, J, Shafran, I & Song, X 2014, Automatic measurement of affective valence and arousal in speech. in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014., 6853740, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 965-969, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, 5/4/14. https://doi.org/10.1109/ICASSP.2014.6853740

Asgari M, Kiss G, Van Santen J, Shafran I, Song X. Automatic measurement of affective valence and arousal in speech. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 965-969. 6853740. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2014.6853740

Asgari, Meysam ; Kiss, Geza ; Van Santen, Jan et al. / Automatic measurement of affective valence and arousal in speech. 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 965-969 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{f0a1f9230ca44abfb6a09541ba8908c8,

title = "Automatic measurement of affective valence and arousal in speech",

abstract = "Methods are proposed for measuring affective valence and arousal in speech. The methods apply support vector regression to prosodic and text features to predict human valence and arousal ratings of three stimulus types: speech, delexicalized speech, and text transcripts. Text features are extracted from transcripts via a lookup table listing per-word valence and arousal values and computing per-utterance statistics from the per-word values. Prediction of arousal ratings of delexicalized speech and of speech from prosodic features was successful, with accuracy levels not far from limits set by the reliability of the human ratings. Prediction of valence for these stimulus types as well as prediction of both dimensions for text stimuli proved more difficult, even though the corresponding human ratings were as reliable. Text based features did add, however, to the accuracy of prediction of valence for speech stimuli. We conclude that arousal of speech can be measured reliably, but not valence, and that improving the latter requires better lexical features.",

keywords = "affect, arousal, valence",

author = "Meysam Asgari and Geza Kiss and {Van Santen}, Jan and Izhak Shafran and Xubo Song",

year = "2014",

doi = "10.1109/ICASSP.2014.6853740",

language = "English (US)",

isbn = "9781479928927",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "965--969",

booktitle = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014",

note = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 ; Conference date: 04-05-2014 Through 09-05-2014",

}

TY - GEN

T1 - Automatic measurement of affective valence and arousal in speech

AU - Asgari, Meysam

AU - Kiss, Geza

AU - Van Santen, Jan

AU - Shafran, Izhak

AU - Song, Xubo

PY - 2014

Y1 - 2014

N2 - Methods are proposed for measuring affective valence and arousal in speech. The methods apply support vector regression to prosodic and text features to predict human valence and arousal ratings of three stimulus types: speech, delexicalized speech, and text transcripts. Text features are extracted from transcripts via a lookup table listing per-word valence and arousal values and computing per-utterance statistics from the per-word values. Prediction of arousal ratings of delexicalized speech and of speech from prosodic features was successful, with accuracy levels not far from limits set by the reliability of the human ratings. Prediction of valence for these stimulus types as well as prediction of both dimensions for text stimuli proved more difficult, even though the corresponding human ratings were as reliable. Text based features did add, however, to the accuracy of prediction of valence for speech stimuli. We conclude that arousal of speech can be measured reliably, but not valence, and that improving the latter requires better lexical features.

AB - Methods are proposed for measuring affective valence and arousal in speech. The methods apply support vector regression to prosodic and text features to predict human valence and arousal ratings of three stimulus types: speech, delexicalized speech, and text transcripts. Text features are extracted from transcripts via a lookup table listing per-word valence and arousal values and computing per-utterance statistics from the per-word values. Prediction of arousal ratings of delexicalized speech and of speech from prosodic features was successful, with accuracy levels not far from limits set by the reliability of the human ratings. Prediction of valence for these stimulus types as well as prediction of both dimensions for text stimuli proved more difficult, even though the corresponding human ratings were as reliable. Text based features did add, however, to the accuracy of prediction of valence for speech stimuli. We conclude that arousal of speech can be measured reliably, but not valence, and that improving the latter requires better lexical features.

KW - affect

KW - arousal

KW - valence

UR - http://www.scopus.com/inward/record.url?scp=84905254292&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905254292&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6853740

DO - 10.1109/ICASSP.2014.6853740

M3 - Conference contribution

AN - SCOPUS:84905254292

SN - 9781479928927

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 965

EP - 969

BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

Y2 - 4 May 2014 through 9 May 2014

ER -

Automatic measurement of affective valence and arousal in speech

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this