TY - JOUR
T1 - Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records
AU - Chen, Chia Yen
AU - Lee, Phil H.
AU - Castro, Victor M.
AU - Minnier, Jessica
AU - Charney, Alexander W.
AU - Stahl, Eli A.
AU - Ruderfer, Douglas M.
AU - Murphy, Shawn N.
AU - Gainer, Vivian
AU - Cai, Tianxi
AU - Jones, Ian
AU - Pato, Carlos N.
AU - Pato, Michele T.
AU - Landén, Mikael
AU - Sklar, Pamela
AU - Perlis, Roy H.
AU - Smoller, Jordan W.
N1 - Funding Information:
This work was supported in part by NIMH grants R01MH085542 (JWS and PS), R01MH085545 (JWS), U01HG008685 (JWS), and K24MH094614 (JWS) and by support from the Demarest Lloyd, Jr. Foundation. Dr. Smoller is a Tepper Family MGH Research Scholar.
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.
AB - Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.
UR - http://www.scopus.com/inward/record.url?scp=85045620951&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045620951&partnerID=8YFLogxK
U2 - 10.1038/s41398-018-0133-7
DO - 10.1038/s41398-018-0133-7
M3 - Article
C2 - 29666432
AN - SCOPUS:85045620951
SN - 2158-3188
VL - 8
JO - Translational Psychiatry
JF - Translational Psychiatry
IS - 1
M1 - 86
ER -