Validation of electronic health record phenotyping of bipolar disorder cases and controls

Victor M. Castro; Jessica Minnier; Shawn N. Murphy; Isaac Kohane; Susanne E. Churchill; Vivian Gainer; Tianxi Cai; Alison G. Hoffnagle; Yael Dai; Stefanie Block; Sydney R. Weill; Mireya Nadal-Vicens; Alisha R. Pollastri; J. Niels Rosenquist; Sergey Goryachev; Dost Ongur; Pamela Sklar; Roy H. Perlis; Jordan W. Smoller; Phil Hyoun Lee; Eli A. Stahl; Shaun M. Purcell; Douglas M. Ruderfer; Alexander W. Charney; Panos Roussos; Carlos Pato; Michele Pato; Helen Medeiros; Janet Sobel; Nick Craddock; Ian Jones; Liz Forty; Arianna DiFlorio; Elaine Green; Lisa Jones; Katherine Dunjewski; Mikael Landén; Christina Hultman; Anders Juréus; Sarah Bergen; Oscar Svantesson; Steven McCarroll; Jennifer Moran; Kimberly Chambert; Richard A. Belliveau

doi:10.1176/appi.ajp.2014.14030423

Validation of electronic health record phenotyping of bipolar disorder cases and controls

Victor M. Castro, Jessica Minnier, Shawn N. Murphy, Isaac Kohane, Susanne E. Churchill, Vivian Gainer, Tianxi Cai, Alison G. Hoffnagle, Yael Dai, Stefanie Block, Sydney R. Weill, Mireya Nadal-Vicens, Alisha R. Pollastri, J. Niels Rosenquist, Sergey Goryachev, Dost Ongur, Pamela Sklar, Roy H. Perlis, Jordan W. Smoller, Phil Hyoun LeeEli A. Stahl, Shaun M. Purcell, Douglas M. Ruderfer, Alexander W. Charney, Panos Roussos, Carlos Pato, Michele Pato, Helen Medeiros, Janet Sobel, Nick Craddock, Ian Jones, Liz Forty, Arianna DiFlorio, Elaine Green, Lisa Jones, Katherine Dunjewski, Mikael Landén, Christina Hultman, Anders Juréus, Sarah Bergen, Oscar Svantesson, Steven McCarroll, Jennifer Moran, Kimberly Chambert, Richard A. Belliveau

Research output: Contribution to journal › Article › peer-review

90 Scopus citations

Abstract

Objective: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHRclassified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomatedmining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for highthroughput phenotyping for genetic and clinical research.

Original language	English (US)
Pages (from-to)	363-372
Number of pages	10
Journal	American Journal of Psychiatry
Volume	172
Issue number	4
DOIs	https://doi.org/10.1176/appi.ajp.2014.14030423
State	Published - Apr 1 2015
Externally published	Yes

ASJC Scopus subject areas

Psychiatry and Mental health

Access to Document

10.1176/appi.ajp.2014.14030423

Cite this

Castro, V. M., Minnier, J., Murphy, S. N., Kohane, I., Churchill, S. E., Gainer, V., Cai, T., Hoffnagle, A. G., Dai, Y., Block, S., Weill, S. R., Nadal-Vicens, M., Pollastri, A. R., Rosenquist, J. N., Goryachev, S., Ongur, D., Sklar, P., Perlis, R. H., Smoller, J. W., ... Belliveau, R. A. (2015). Validation of electronic health record phenotyping of bipolar disorder cases and controls. American Journal of Psychiatry, 172(4), 363-372. https://doi.org/10.1176/appi.ajp.2014.14030423

Castro, VM, Minnier, J, Murphy, SN, Kohane, I, Churchill, SE, Gainer, V, Cai, T, Hoffnagle, AG, Dai, Y, Block, S, Weill, SR, Nadal-Vicens, M, Pollastri, AR, Rosenquist, JN, Goryachev, S, Ongur, D, Sklar, P, Perlis, RH, Smoller, JW, Lee, PH, Stahl, EA, Purcell, SM, Ruderfer, DM, Charney, AW, Roussos, P, Pato, C, Pato, M, Medeiros, H, Sobel, J, Craddock, N, Jones, I, Forty, L, DiFlorio, A, Green, E, Jones, L, Dunjewski, K, Landén, M, Hultman, C, Juréus, A, Bergen, S, Svantesson, O, McCarroll, S, Moran, J, Chambert, K & Belliveau, RA 2015, 'Validation of electronic health record phenotyping of bipolar disorder cases and controls', American Journal of Psychiatry, vol. 172, no. 4, pp. 363-372. https://doi.org/10.1176/appi.ajp.2014.14030423

@article{51a625b1974e4ac88c825e700cb63c99,

title = "Validation of electronic health record phenotyping of bipolar disorder cases and controls",

abstract = "Objective: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHRclassified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomatedmining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for highthroughput phenotyping for genetic and clinical research.",

author = "Castro, {Victor M.} and Jessica Minnier and Murphy, {Shawn N.} and Isaac Kohane and Churchill, {Susanne E.} and Vivian Gainer and Tianxi Cai and Hoffnagle, {Alison G.} and Yael Dai and Stefanie Block and Weill, {Sydney R.} and Mireya Nadal-Vicens and Pollastri, {Alisha R.} and Rosenquist, {J. Niels} and Sergey Goryachev and Dost Ongur and Pamela Sklar and Perlis, {Roy H.} and Smoller, {Jordan W.} and Lee, {Phil Hyoun} and Stahl, {Eli A.} and Purcell, {Shaun M.} and Ruderfer, {Douglas M.} and Charney, {Alexander W.} and Panos Roussos and Carlos Pato and Michele Pato and Helen Medeiros and Janet Sobel and Nick Craddock and Ian Jones and Liz Forty and Arianna DiFlorio and Elaine Green and Lisa Jones and Katherine Dunjewski and Mikael Land{\'e}n and Christina Hultman and Anders Jur{\'e}us and Sarah Bergen and Oscar Svantesson and Steven McCarroll and Jennifer Moran and Kimberly Chambert and Belliveau, {Richard A.}",

year = "2015",

month = apr,

day = "1",

doi = "10.1176/appi.ajp.2014.14030423",

language = "English (US)",

volume = "172",

pages = "363--372",

journal = "American Journal of Psychiatry",

issn = "0002-953X",

publisher = "American Psychiatric Association",

number = "4",

}

TY - JOUR

T1 - Validation of electronic health record phenotyping of bipolar disorder cases and controls

AU - Castro, Victor M.

AU - Minnier, Jessica

AU - Murphy, Shawn N.

AU - Kohane, Isaac

AU - Churchill, Susanne E.

AU - Gainer, Vivian

AU - Cai, Tianxi

AU - Hoffnagle, Alison G.

AU - Dai, Yael

AU - Block, Stefanie

AU - Weill, Sydney R.

AU - Nadal-Vicens, Mireya

AU - Pollastri, Alisha R.

AU - Rosenquist, J. Niels

AU - Goryachev, Sergey

AU - Ongur, Dost

AU - Sklar, Pamela

AU - Perlis, Roy H.

AU - Smoller, Jordan W.

AU - Lee, Phil Hyoun

AU - Stahl, Eli A.

AU - Purcell, Shaun M.

AU - Ruderfer, Douglas M.

AU - Charney, Alexander W.

AU - Roussos, Panos

AU - Pato, Carlos

AU - Pato, Michele

AU - Medeiros, Helen

AU - Sobel, Janet

AU - Craddock, Nick

AU - Jones, Ian

AU - Forty, Liz

AU - DiFlorio, Arianna

AU - Green, Elaine

AU - Jones, Lisa

AU - Dunjewski, Katherine

AU - Landén, Mikael

AU - Hultman, Christina

AU - Juréus, Anders

AU - Bergen, Sarah

AU - Svantesson, Oscar

AU - McCarroll, Steven

AU - Moran, Jennifer

AU - Chambert, Kimberly

AU - Belliveau, Richard A.

PY - 2015/4/1

Y1 - 2015/4/1

N2 - Objective: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHRclassified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomatedmining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for highthroughput phenotyping for genetic and clinical research.

AB - Objective: The study was designed to validate use of electronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype diagnoses was calculated against diagnoses from direct semistructured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHRclassified control subject received a diagnosis of bipolar disorder on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based classifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomatedmining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for highthroughput phenotyping for genetic and clinical research.

UR - http://www.scopus.com/inward/record.url?scp=84961290792&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961290792&partnerID=8YFLogxK

U2 - 10.1176/appi.ajp.2014.14030423

DO - 10.1176/appi.ajp.2014.14030423

M3 - Article

C2 - 25827034

AN - SCOPUS:84961290792

SN - 0002-953X

VL - 172

SP - 363

EP - 372

JO - American Journal of Psychiatry

JF - American Journal of Psychiatry

IS - 4

ER -

Validation of electronic health record phenotyping of bipolar disorder cases and controls

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this