TY - JOUR
T1 - Fully automated assessment of the severity of Parkinson's disease from speech
AU - Bayestehtashk, Alireza
AU - Asgari, Meysam
AU - Shafran, Izhak
AU - McNames, James
N1 - Funding Information:
This work was supported by Kinetics Foundation and NSF awards 0964102 and 1027834, NIH award AG033723 and support from Intel, Google and IBM . We would like to thank Jan van Santen (OHSU) and Max A. Little (University of Oxford) for their comments on speech data collection and Ken Kubota (Kinetics Foundation) for facilitating the study. We are extremely grateful to our clinical collaborators Fay Horak (OHSU), Michael Aminoff (UCSF), William Marks Jr. (UCSF), Jim Tetrud (Parkinson's Institute), Grace Liang (Parkinson's Institute), and Steven Gunzler (University Hospitals Case Medical Center) for performing the clinical assessments and collecting the speech data from the subjects.
Publisher Copyright:
© 2014 Elsevier Ltd. All rights reserved.
PY - 2015/1
Y1 - 2015/1
N2 - For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.
AB - For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.
KW - Jitter
KW - Parkinson's disease
KW - Pitch estimation
KW - Shimmer
UR - http://www.scopus.com/inward/record.url?scp=84908502342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908502342&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2013.12.001
DO - 10.1016/j.csl.2013.12.001
M3 - Article
AN - SCOPUS:84908502342
SN - 0885-2308
VL - 29
SP - 172
EP - 185
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 1
ER -