Inferring clinical depression from speech and spoken utterances

Meysam Asgari, Izhak Shafran, Lisa B. Sheeber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in 'what is said' (content) and 'how it is said' (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.

Original languageEnglish (US)
Title of host publicationIEEE International Workshop on Machine Learning for Signal Processing, MLSP
EditorsTulay Adali, Jan Larsen, Mamadou Mboup, Eric Moreau
PublisherIEEE Computer Society
ISBN (Electronic)9781479936946
DOIs
StatePublished - Nov 14 2014
Event2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014 - Reims, France
Duration: Sep 21 2014Sep 24 2014

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference2014 24th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2014
Country/TerritoryFrance
CityReims
Period9/21/149/24/14

Keywords

  • Depression
  • Speech analysis
  • Telemedicine

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Inferring clinical depression from speech and spoken utterances'. Together they form a unique fingerprint.

Cite this