Fully automated assessment of the severity of Parkinson's disease from speech

Alireza Bayestehtashk; Meysam Asgari; Izhak Shafran; James McNames

doi:10.1016/j.csl.2013.12.001

Fully automated assessment of the severity of Parkinson's disease from speech

Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran, James McNames

Institute on Development and Disability

Research output: Contribution to journal › Article › peer-review

78 Scopus citations

Abstract

For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.

Original language	English (US)
Pages (from-to)	172-185
Number of pages	14
Journal	Computer Speech and Language
Volume	29
Issue number	1
DOIs	https://doi.org/10.1016/j.csl.2013.12.001
State	Published - Jan 2015

Keywords

Jitter
Parkinson's disease
Pitch estimation
Shimmer

ASJC Scopus subject areas

Software
Theoretical Computer Science
Human-Computer Interaction

Access to Document

10.1016/j.csl.2013.12.001

Cite this

@article{7e76ad3da7764ec29c633b6594e37160,

title = "Fully automated assessment of the severity of Parkinson's disease from speech",

abstract = "For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.",

keywords = "Jitter, Parkinson's disease, Pitch estimation, Shimmer",

author = "Alireza Bayestehtashk and Meysam Asgari and Izhak Shafran and James McNames",

year = "2015",

month = jan,

doi = "10.1016/j.csl.2013.12.001",

language = "English (US)",

volume = "29",

pages = "172--185",

journal = "Computer Speech and Language",

issn = "0885-2308",

publisher = "Academic Press Inc.",

number = "1",

}

TY - JOUR

T1 - Fully automated assessment of the severity of Parkinson's disease from speech

AU - Bayestehtashk, Alireza

AU - Asgari, Meysam

AU - Shafran, Izhak

AU - McNames, James

PY - 2015/1

Y1 - 2015/1

N2 - For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.

AB - For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.

KW - Jitter

KW - Parkinson's disease

KW - Pitch estimation

KW - Shimmer

UR - http://www.scopus.com/inward/record.url?scp=84908502342&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908502342&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2013.12.001

DO - 10.1016/j.csl.2013.12.001

M3 - Article

AN - SCOPUS:84908502342

SN - 0885-2308

VL - 29

SP - 172

EP - 185

JO - Computer Speech and Language

JF - Computer Speech and Language

IS - 1

ER -

Fully automated assessment of the severity of Parkinson's disease from speech

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this