Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

Alan Hubbard, Ivan Diaz Munoz, Anna Decker, John B. Holcomb, Martin Schreiber, Eileen M. Bulger, Karen Brasel, Erin E. Fox, Deborah J. Del Junco, Charles E. Wade, Mohammad H. Rahbar, Bryan A. Cotton, Herb A. Phelan, John G. Myers, Louis H. Alarcon, Peter Muskat, Mitchell J. Cohen

    Research output: Contribution to journalArticle

    15 Citations (Scopus)

    Abstract

    BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

    Original languageEnglish (US)
    JournalJournal of Trauma and Acute Care Surgery
    Volume75
    Issue number1 SUPPL1
    DOIs
    StatePublished - 2013

    Fingerprint

    Mortality
    Uncertainty
    Logistic Models
    Hemorrhage
    Wounds and Injuries
    Machine Learning
    Datasets

    Keywords

    • Causal inference
    • Injury
    • PROMMTT
    • Statistical prediction
    • Trauma

    ASJC Scopus subject areas

    • Critical Care and Intensive Care Medicine
    • Surgery

    Cite this

    Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data. / Hubbard, Alan; Munoz, Ivan Diaz; Decker, Anna; Holcomb, John B.; Schreiber, Martin; Bulger, Eileen M.; Brasel, Karen; Fox, Erin E.; Del Junco, Deborah J.; Wade, Charles E.; Rahbar, Mohammad H.; Cotton, Bryan A.; Phelan, Herb A.; Myers, John G.; Alarcon, Louis H.; Muskat, Peter; Cohen, Mitchell J.

    In: Journal of Trauma and Acute Care Surgery, Vol. 75, No. 1 SUPPL1, 2013.

    Research output: Contribution to journalArticle

    Hubbard, A, Munoz, ID, Decker, A, Holcomb, JB, Schreiber, M, Bulger, EM, Brasel, K, Fox, EE, Del Junco, DJ, Wade, CE, Rahbar, MH, Cotton, BA, Phelan, HA, Myers, JG, Alarcon, LH, Muskat, P & Cohen, MJ 2013, 'Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data', Journal of Trauma and Acute Care Surgery, vol. 75, no. 1 SUPPL1. https://doi.org/10.1097/TA.0b013e3182914553
    Hubbard, Alan ; Munoz, Ivan Diaz ; Decker, Anna ; Holcomb, John B. ; Schreiber, Martin ; Bulger, Eileen M. ; Brasel, Karen ; Fox, Erin E. ; Del Junco, Deborah J. ; Wade, Charles E. ; Rahbar, Mohammad H. ; Cotton, Bryan A. ; Phelan, Herb A. ; Myers, John G. ; Alarcon, Louis H. ; Muskat, Peter ; Cohen, Mitchell J. / Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data. In: Journal of Trauma and Acute Care Surgery. 2013 ; Vol. 75, No. 1 SUPPL1.
    @article{dac7ecc9a74341ffa66f381db8fb4c57,
    title = "Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data",
    abstract = "BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.",
    keywords = "Causal inference, Injury, PROMMTT, Statistical prediction, Trauma",
    author = "Alan Hubbard and Munoz, {Ivan Diaz} and Anna Decker and Holcomb, {John B.} and Martin Schreiber and Bulger, {Eileen M.} and Karen Brasel and Fox, {Erin E.} and {Del Junco}, {Deborah J.} and Wade, {Charles E.} and Rahbar, {Mohammad H.} and Cotton, {Bryan A.} and Phelan, {Herb A.} and Myers, {John G.} and Alarcon, {Louis H.} and Peter Muskat and Cohen, {Mitchell J.}",
    year = "2013",
    doi = "10.1097/TA.0b013e3182914553",
    language = "English (US)",
    volume = "75",
    journal = "Journal of Trauma and Acute Care Surgery",
    issn = "2163-0755",
    publisher = "Lippincott Williams and Wilkins",
    number = "1 SUPPL1",

    }

    TY - JOUR

    T1 - Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

    AU - Hubbard, Alan

    AU - Munoz, Ivan Diaz

    AU - Decker, Anna

    AU - Holcomb, John B.

    AU - Schreiber, Martin

    AU - Bulger, Eileen M.

    AU - Brasel, Karen

    AU - Fox, Erin E.

    AU - Del Junco, Deborah J.

    AU - Wade, Charles E.

    AU - Rahbar, Mohammad H.

    AU - Cotton, Bryan A.

    AU - Phelan, Herb A.

    AU - Myers, John G.

    AU - Alarcon, Louis H.

    AU - Muskat, Peter

    AU - Cohen, Mitchell J.

    PY - 2013

    Y1 - 2013

    N2 - BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

    AB - BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

    KW - Causal inference

    KW - Injury

    KW - PROMMTT

    KW - Statistical prediction

    KW - Trauma

    UR - http://www.scopus.com/inward/record.url?scp=84880413655&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84880413655&partnerID=8YFLogxK

    U2 - 10.1097/TA.0b013e3182914553

    DO - 10.1097/TA.0b013e3182914553

    M3 - Article

    C2 - 23778512

    AN - SCOPUS:84880413655

    VL - 75

    JO - Journal of Trauma and Acute Care Surgery

    JF - Journal of Trauma and Acute Care Surgery

    SN - 2163-0755

    IS - 1 SUPPL1

    ER -