Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data

Kelly D. Myers, Joshua W. Knowles, David Staszak, Michael D. Shapiro, William Howard, Mrinal Yadava, David Zuzick, Latoya Williamson, Nigam H. Shah, Juan M. Banda, Joe Leader, William C. Cromwell, Ed Trautman, Michael F. Murray, Seth J. Baum, Seth Myers, Samuel S. Gidding, Katherine Wilemon, Daniel J. Rader

    Research output: Contribution to journalArticle

    3 Citations (Scopus)

    Abstract

    Background: Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets. Methods: We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice. Findings: Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision–recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73–100) in the national database and 77% (68–86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment. Interpretation: The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia. Funding: The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

    Original languageEnglish (US)
    Pages (from-to)e393-e402
    JournalThe Lancet Digital Health
    Volume1
    Issue number8
    DOIs
    StatePublished - Dec 2019

    Fingerprint

    Hyperlipoproteinemia Type II
    Delivery of Health Care
    Health
    Databases
    Machine Learning
    Machine learning
    Screening
    Data base
    Integrated Delivery of Health Care
    Health Insurance Portability and Accountability Act
    Health care delivery
    Healthcare
    Hypercholesterolemia
    Hyperlipidemias
    ROC Curve
    Area Under Curve
    Prescriptions
    Early Diagnosis
    Cardiovascular Diseases
    Stroke

    ASJC Scopus subject areas

    • Medicine (miscellaneous)
    • Health Informatics
    • Decision Sciences (miscellaneous)
    • Health Information Management

    Cite this

    Precision screening for familial hypercholesterolaemia : a machine learning study applied to electronic health encounter data. / Myers, Kelly D.; Knowles, Joshua W.; Staszak, David; Shapiro, Michael D.; Howard, William; Yadava, Mrinal; Zuzick, David; Williamson, Latoya; Shah, Nigam H.; Banda, Juan M.; Leader, Joe; Cromwell, William C.; Trautman, Ed; Murray, Michael F.; Baum, Seth J.; Myers, Seth; Gidding, Samuel S.; Wilemon, Katherine; Rader, Daniel J.

    In: The Lancet Digital Health, Vol. 1, No. 8, 12.2019, p. e393-e402.

    Research output: Contribution to journalArticle

    Myers, KD, Knowles, JW, Staszak, D, Shapiro, MD, Howard, W, Yadava, M, Zuzick, D, Williamson, L, Shah, NH, Banda, JM, Leader, J, Cromwell, WC, Trautman, E, Murray, MF, Baum, SJ, Myers, S, Gidding, SS, Wilemon, K & Rader, DJ 2019, 'Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data', The Lancet Digital Health, vol. 1, no. 8, pp. e393-e402. https://doi.org/10.1016/S2589-7500(19)30150-5
    Myers, Kelly D. ; Knowles, Joshua W. ; Staszak, David ; Shapiro, Michael D. ; Howard, William ; Yadava, Mrinal ; Zuzick, David ; Williamson, Latoya ; Shah, Nigam H. ; Banda, Juan M. ; Leader, Joe ; Cromwell, William C. ; Trautman, Ed ; Murray, Michael F. ; Baum, Seth J. ; Myers, Seth ; Gidding, Samuel S. ; Wilemon, Katherine ; Rader, Daniel J. / Precision screening for familial hypercholesterolaemia : a machine learning study applied to electronic health encounter data. In: The Lancet Digital Health. 2019 ; Vol. 1, No. 8. pp. e393-e402.
    @article{9267c510115f4989aa7f978ae630d002,
    title = "Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data",
    abstract = "Background: Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90{\%} of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets. Methods: We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice. Findings: Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision–recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87{\%} (95{\%} Cl 73–100) in the national database and 77{\%} (68–86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment. Interpretation: The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia. Funding: The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.",
    author = "Myers, {Kelly D.} and Knowles, {Joshua W.} and David Staszak and Shapiro, {Michael D.} and William Howard and Mrinal Yadava and David Zuzick and Latoya Williamson and Shah, {Nigam H.} and Banda, {Juan M.} and Joe Leader and Cromwell, {William C.} and Ed Trautman and Murray, {Michael F.} and Baum, {Seth J.} and Seth Myers and Gidding, {Samuel S.} and Katherine Wilemon and Rader, {Daniel J.}",
    year = "2019",
    month = "12",
    doi = "10.1016/S2589-7500(19)30150-5",
    language = "English (US)",
    volume = "1",
    pages = "e393--e402",
    journal = "The Lancet Digital Health",
    issn = "2589-7500",
    publisher = "Elsevier Ltd",
    number = "8",

    }

    TY - JOUR

    T1 - Precision screening for familial hypercholesterolaemia

    T2 - a machine learning study applied to electronic health encounter data

    AU - Myers, Kelly D.

    AU - Knowles, Joshua W.

    AU - Staszak, David

    AU - Shapiro, Michael D.

    AU - Howard, William

    AU - Yadava, Mrinal

    AU - Zuzick, David

    AU - Williamson, Latoya

    AU - Shah, Nigam H.

    AU - Banda, Juan M.

    AU - Leader, Joe

    AU - Cromwell, William C.

    AU - Trautman, Ed

    AU - Murray, Michael F.

    AU - Baum, Seth J.

    AU - Myers, Seth

    AU - Gidding, Samuel S.

    AU - Wilemon, Katherine

    AU - Rader, Daniel J.

    PY - 2019/12

    Y1 - 2019/12

    N2 - Background: Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets. Methods: We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice. Findings: Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision–recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73–100) in the national database and 77% (68–86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment. Interpretation: The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia. Funding: The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

    AB - Background: Cardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets. Methods: We trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice. Findings: Using a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision–recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73–100) in the national database and 77% (68–86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment. Interpretation: The FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia. Funding: The FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

    UR - http://www.scopus.com/inward/record.url?scp=85075522435&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85075522435&partnerID=8YFLogxK

    U2 - 10.1016/S2589-7500(19)30150-5

    DO - 10.1016/S2589-7500(19)30150-5

    M3 - Article

    AN - SCOPUS:85075522435

    VL - 1

    SP - e393-e402

    JO - The Lancet Digital Health

    JF - The Lancet Digital Health

    SN - 2589-7500

    IS - 8

    ER -