Big Data Analytical Approaches to the NACC Dataset: Aiding Preclinical Trial Enrichment

Ming Lin, Pinghua Gong, Tao Yang, Jieping Ye, Roger L. Albin, Hiroko Dodge

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: Clinical trials increasingly aim to retard disease progression during presymptomatic phases of Mild Cognitive Impairment (MCI) and thus recruiting study participants at high risk for developing MCI is critical for cost-effective prevention trials. However, accurately identifying those who are destined to develop MCI is difficult. Collecting biomarkers is often expensive. Methods: We used only noninvasive clinical variables collected in the National Alzheimer's Coordinating Center (NACC) Uniform Data Sets version 2.0 and applied machine learning techniques to build a low-cost and accurate Mild Cognitive Impairment (MCI) conversion prediction calculator. Cross-validation and bootstrap were used to select as few variables as possible accurately predicting MCI conversion within 4 years. Results: A total of 31,872 unique subjects, 748 clinical variables, and additional 128 derived variables in NACC data sets were used. About 15 noninvasive clinical variables are identified for predicting MCI/aMCI/naMCI converters, respectively. Over 75% Receiver Operating Characteristic Area Under the Curves (ROC AUC) was achieved. By bootstrap we created a simple spreadsheet calculator which estimates the probability of developing MCI within 4 years with a 95% confidence interval. Conclusions: We achieved reasonably high prediction accuracy using only clinical variables. The approach used here could be useful for study enrichment in preclinical trials where enrolling participants at risk of cognitive decline is critical for proving study efficacy, and also for developing a shorter assessment battery.

Original languageEnglish (US)
Pages (from-to)18-27
Number of pages10
JournalAlzheimer Disease and Associated Disorders
Volume32
Issue number1
DOIs
StatePublished - Jan 1 2018

Fingerprint

Costs and Cost Analysis
Cognitive Dysfunction
Datasets
ROC Curve
Area Under Curve
Disease Progression
Biomarkers
Clinical Trials
Confidence Intervals
Machine Learning

Keywords

  • bootstrap
  • dementia
  • incidence
  • machine learning
  • mild cognitive impairment
  • National Alzheimer's Coordinating Center Uniform Data Set (NACC UDS)
  • prediction
  • ROC AUC
  • study enrichment

ASJC Scopus subject areas

  • Clinical Psychology
  • Gerontology
  • Geriatrics and Gerontology
  • Psychiatry and Mental health

Cite this

Big Data Analytical Approaches to the NACC Dataset : Aiding Preclinical Trial Enrichment. / Lin, Ming; Gong, Pinghua; Yang, Tao; Ye, Jieping; Albin, Roger L.; Dodge, Hiroko.

In: Alzheimer Disease and Associated Disorders, Vol. 32, No. 1, 01.01.2018, p. 18-27.

Research output: Contribution to journalArticle

Lin, Ming ; Gong, Pinghua ; Yang, Tao ; Ye, Jieping ; Albin, Roger L. ; Dodge, Hiroko. / Big Data Analytical Approaches to the NACC Dataset : Aiding Preclinical Trial Enrichment. In: Alzheimer Disease and Associated Disorders. 2018 ; Vol. 32, No. 1. pp. 18-27.
@article{27543cd5054b4148b33d9728a758facb,
title = "Big Data Analytical Approaches to the NACC Dataset: Aiding Preclinical Trial Enrichment",
abstract = "Background: Clinical trials increasingly aim to retard disease progression during presymptomatic phases of Mild Cognitive Impairment (MCI) and thus recruiting study participants at high risk for developing MCI is critical for cost-effective prevention trials. However, accurately identifying those who are destined to develop MCI is difficult. Collecting biomarkers is often expensive. Methods: We used only noninvasive clinical variables collected in the National Alzheimer's Coordinating Center (NACC) Uniform Data Sets version 2.0 and applied machine learning techniques to build a low-cost and accurate Mild Cognitive Impairment (MCI) conversion prediction calculator. Cross-validation and bootstrap were used to select as few variables as possible accurately predicting MCI conversion within 4 years. Results: A total of 31,872 unique subjects, 748 clinical variables, and additional 128 derived variables in NACC data sets were used. About 15 noninvasive clinical variables are identified for predicting MCI/aMCI/naMCI converters, respectively. Over 75{\%} Receiver Operating Characteristic Area Under the Curves (ROC AUC) was achieved. By bootstrap we created a simple spreadsheet calculator which estimates the probability of developing MCI within 4 years with a 95{\%} confidence interval. Conclusions: We achieved reasonably high prediction accuracy using only clinical variables. The approach used here could be useful for study enrichment in preclinical trials where enrolling participants at risk of cognitive decline is critical for proving study efficacy, and also for developing a shorter assessment battery.",
keywords = "bootstrap, dementia, incidence, machine learning, mild cognitive impairment, National Alzheimer's Coordinating Center Uniform Data Set (NACC UDS), prediction, ROC AUC, study enrichment",
author = "Ming Lin and Pinghua Gong and Tao Yang and Jieping Ye and Albin, {Roger L.} and Hiroko Dodge",
year = "2018",
month = "1",
day = "1",
doi = "10.1097/WAD.0000000000000228",
language = "English (US)",
volume = "32",
pages = "18--27",
journal = "Alzheimer Disease and Associated Disorders",
issn = "0893-0341",
publisher = "Lippincott Williams and Wilkins",
number = "1",

}

TY - JOUR

T1 - Big Data Analytical Approaches to the NACC Dataset

T2 - Aiding Preclinical Trial Enrichment

AU - Lin, Ming

AU - Gong, Pinghua

AU - Yang, Tao

AU - Ye, Jieping

AU - Albin, Roger L.

AU - Dodge, Hiroko

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Background: Clinical trials increasingly aim to retard disease progression during presymptomatic phases of Mild Cognitive Impairment (MCI) and thus recruiting study participants at high risk for developing MCI is critical for cost-effective prevention trials. However, accurately identifying those who are destined to develop MCI is difficult. Collecting biomarkers is often expensive. Methods: We used only noninvasive clinical variables collected in the National Alzheimer's Coordinating Center (NACC) Uniform Data Sets version 2.0 and applied machine learning techniques to build a low-cost and accurate Mild Cognitive Impairment (MCI) conversion prediction calculator. Cross-validation and bootstrap were used to select as few variables as possible accurately predicting MCI conversion within 4 years. Results: A total of 31,872 unique subjects, 748 clinical variables, and additional 128 derived variables in NACC data sets were used. About 15 noninvasive clinical variables are identified for predicting MCI/aMCI/naMCI converters, respectively. Over 75% Receiver Operating Characteristic Area Under the Curves (ROC AUC) was achieved. By bootstrap we created a simple spreadsheet calculator which estimates the probability of developing MCI within 4 years with a 95% confidence interval. Conclusions: We achieved reasonably high prediction accuracy using only clinical variables. The approach used here could be useful for study enrichment in preclinical trials where enrolling participants at risk of cognitive decline is critical for proving study efficacy, and also for developing a shorter assessment battery.

AB - Background: Clinical trials increasingly aim to retard disease progression during presymptomatic phases of Mild Cognitive Impairment (MCI) and thus recruiting study participants at high risk for developing MCI is critical for cost-effective prevention trials. However, accurately identifying those who are destined to develop MCI is difficult. Collecting biomarkers is often expensive. Methods: We used only noninvasive clinical variables collected in the National Alzheimer's Coordinating Center (NACC) Uniform Data Sets version 2.0 and applied machine learning techniques to build a low-cost and accurate Mild Cognitive Impairment (MCI) conversion prediction calculator. Cross-validation and bootstrap were used to select as few variables as possible accurately predicting MCI conversion within 4 years. Results: A total of 31,872 unique subjects, 748 clinical variables, and additional 128 derived variables in NACC data sets were used. About 15 noninvasive clinical variables are identified for predicting MCI/aMCI/naMCI converters, respectively. Over 75% Receiver Operating Characteristic Area Under the Curves (ROC AUC) was achieved. By bootstrap we created a simple spreadsheet calculator which estimates the probability of developing MCI within 4 years with a 95% confidence interval. Conclusions: We achieved reasonably high prediction accuracy using only clinical variables. The approach used here could be useful for study enrichment in preclinical trials where enrolling participants at risk of cognitive decline is critical for proving study efficacy, and also for developing a shorter assessment battery.

KW - bootstrap

KW - dementia

KW - incidence

KW - machine learning

KW - mild cognitive impairment

KW - National Alzheimer's Coordinating Center Uniform Data Set (NACC UDS)

KW - prediction

KW - ROC AUC

KW - study enrichment

UR - http://www.scopus.com/inward/record.url?scp=85044323568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044323568&partnerID=8YFLogxK

U2 - 10.1097/WAD.0000000000000228

DO - 10.1097/WAD.0000000000000228

M3 - Article

AN - SCOPUS:85044323568

VL - 32

SP - 18

EP - 27

JO - Alzheimer Disease and Associated Disorders

JF - Alzheimer Disease and Associated Disorders

SN - 0893-0341

IS - 1

ER -