Inferring clinical depression from speech and spoken utterances

Meysam Asgari; Izhak Shafran; Lisa B. Sheeber

doi:10.1109/MLSP.2014.6958856

Inferring clinical depression from speech and spoken utterances

Meysam Asgari, Izhak Shafran, Lisa B. Sheeber

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

29 Scopus citations

Abstract

In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in 'what is said' (content) and 'how it is said' (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.

Original language	English (US)
Title of host publication	IEEE International Workshop on Machine Learning for Signal Processing, MLSP
Editors	Mamadou Mboup, Tulay Adali, Eric Moreau, Jan Larsen
Publisher	IEEE Computer Society
ISBN (Electronic)	9781479936946
DOIs	https://doi.org/10.1109/MLSP.2014.6958856
State	Published - Nov 14 2014
Event	2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France Duration: Sep 21 2014 → Sep 24 2014

Publication series

Name	IEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)	2161-0363
ISSN (Electronic)	2161-0371

Conference

Conference	2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
Country/Territory	France
City	Reims
Period	9/21/14 → 9/24/14

Keywords

Depression
Speech analysis
Telemedicine

ASJC Scopus subject areas

Human-Computer Interaction
Signal Processing

Access to Document

10.1109/MLSP.2014.6958856

Cite this

Asgari, M., Shafran, I., & Sheeber, L. B. (2014). Inferring clinical depression from speech and spoken utterances. In M. Mboup, T. Adali, E. Moreau, & J. Larsen (Eds.), IEEE International Workshop on Machine Learning for Signal Processing, MLSP Article 6958856 (IEEE International Workshop on Machine Learning for Signal Processing, MLSP). IEEE Computer Society. https://doi.org/10.1109/MLSP.2014.6958856

Inferring clinical depression from speech and spoken utterances. / Asgari, Meysam; Shafran, Izhak; Sheeber, Lisa B.
IEEE International Workshop on Machine Learning for Signal Processing, MLSP. ed. / Mamadou Mboup; Tulay Adali; Eric Moreau; Jan Larsen. IEEE Computer Society, 2014. 6958856 (IEEE International Workshop on Machine Learning for Signal Processing, MLSP).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Asgari, M, Shafran, I & Sheeber, LB 2014, Inferring clinical depression from speech and spoken utterances. in M Mboup, T Adali, E Moreau & J Larsen (eds), IEEE International Workshop on Machine Learning for Signal Processing, MLSP., 6958856, IEEE International Workshop on Machine Learning for Signal Processing, MLSP, IEEE Computer Society, 2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014, Reims, France, 9/21/14. https://doi.org/10.1109/MLSP.2014.6958856

Asgari M, Shafran I, Sheeber LB. Inferring clinical depression from speech and spoken utterances. In Mboup M, Adali T, Moreau E, Larsen J, editors, IEEE International Workshop on Machine Learning for Signal Processing, MLSP. IEEE Computer Society. 2014. 6958856. (IEEE International Workshop on Machine Learning for Signal Processing, MLSP). doi: 10.1109/MLSP.2014.6958856

@inproceedings{9af2e8ffc9574ba8a21bf6c6ba5b8720,

title = "Inferring clinical depression from speech and spoken utterances",

abstract = "In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in 'what is said' (content) and 'how it is said' (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.",

keywords = "Depression, Speech analysis, Telemedicine",

author = "Meysam Asgari and Izhak Shafran and Sheeber, {Lisa B.}",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; 2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 ; Conference date: 21-09-2014 Through 24-09-2014",

year = "2014",

month = nov,

day = "14",

doi = "10.1109/MLSP.2014.6958856",

language = "English (US)",

series = "IEEE International Workshop on Machine Learning for Signal Processing, MLSP",

publisher = "IEEE Computer Society",

editor = "Mamadou Mboup and Tulay Adali and Eric Moreau and Jan Larsen",

booktitle = "IEEE International Workshop on Machine Learning for Signal Processing, MLSP",

}

TY - GEN

T1 - Inferring clinical depression from speech and spoken utterances

AU - Asgari, Meysam

AU - Shafran, Izhak

AU - Sheeber, Lisa B.

PY - 2014/11/14

Y1 - 2014/11/14

N2 - In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in 'what is said' (content) and 'how it is said' (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.

AB - In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in 'what is said' (content) and 'how it is said' (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.

KW - Depression

KW - Speech analysis

KW - Telemedicine

UR - http://www.scopus.com/inward/record.url?scp=84912535654&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84912535654&partnerID=8YFLogxK

U2 - 10.1109/MLSP.2014.6958856

DO - 10.1109/MLSP.2014.6958856

M3 - Conference contribution

AN - SCOPUS:84912535654

T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP

BT - IEEE International Workshop on Machine Learning for Signal Processing, MLSP

A2 - Mboup, Mamadou

A2 - Adali, Tulay

A2 - Moreau, Eric

A2 - Larsen, Jan

PB - IEEE Computer Society

T2 - 2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014

Y2 - 21 September 2014 through 24 September 2014

ER -

Inferring clinical depression from speech and spoken utterances

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this