Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records

Chia Yen Chen, Phil H. Lee, Victor M. Castro, Jessica Minnier, Alexander W. Charney, Eli A. Stahl, Douglas M. Ruderfer, Shawn N. Murphy, Vivian Gainer, Tianxi Cai, Ian Jones, Carlos N. Pato, Michele T. Pato, Mikael Landén, Pamela Sklar, Roy H. Perlis, Jordan W. Smoller

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.

Original languageEnglish (US)
Article number86
JournalTranslational Psychiatry
Volume8
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Electronic Health Records
Bipolar Disorder
Psychiatry
Phenotype
Genetic Research
Genetic Loci
Genomics
Mood Disorders
Sample Size
Single Nucleotide Polymorphism

ASJC Scopus subject areas

  • Psychiatry and Mental health
  • Cellular and Molecular Neuroscience
  • Biological Psychiatry

Cite this

Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records. / Chen, Chia Yen; Lee, Phil H.; Castro, Victor M.; Minnier, Jessica; Charney, Alexander W.; Stahl, Eli A.; Ruderfer, Douglas M.; Murphy, Shawn N.; Gainer, Vivian; Cai, Tianxi; Jones, Ian; Pato, Carlos N.; Pato, Michele T.; Landén, Mikael; Sklar, Pamela; Perlis, Roy H.; Smoller, Jordan W.

In: Translational Psychiatry, Vol. 8, No. 1, 86, 01.12.2018.

Research output: Contribution to journalArticle

Chen, CY, Lee, PH, Castro, VM, Minnier, J, Charney, AW, Stahl, EA, Ruderfer, DM, Murphy, SN, Gainer, V, Cai, T, Jones, I, Pato, CN, Pato, MT, Landén, M, Sklar, P, Perlis, RH & Smoller, JW 2018, 'Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records', Translational Psychiatry, vol. 8, no. 1, 86. https://doi.org/10.1038/s41398-018-0133-7
Chen, Chia Yen ; Lee, Phil H. ; Castro, Victor M. ; Minnier, Jessica ; Charney, Alexander W. ; Stahl, Eli A. ; Ruderfer, Douglas M. ; Murphy, Shawn N. ; Gainer, Vivian ; Cai, Tianxi ; Jones, Ian ; Pato, Carlos N. ; Pato, Michele T. ; Landén, Mikael ; Sklar, Pamela ; Perlis, Roy H. ; Smoller, Jordan W. / Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records. In: Translational Psychiatry. 2018 ; Vol. 8, No. 1.
@article{cc299dab650a42c2bc95f30dd5dffb6b,
title = "Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records",
abstract = "Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-{"}coded-strict{"}, {"}coded-broad{"}, and {"}coded-broad based on a single clinical encounter{"} (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.",
author = "Chen, {Chia Yen} and Lee, {Phil H.} and Castro, {Victor M.} and Jessica Minnier and Charney, {Alexander W.} and Stahl, {Eli A.} and Ruderfer, {Douglas M.} and Murphy, {Shawn N.} and Vivian Gainer and Tianxi Cai and Ian Jones and Pato, {Carlos N.} and Pato, {Michele T.} and Mikael Land{\'e}n and Pamela Sklar and Perlis, {Roy H.} and Smoller, {Jordan W.}",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41398-018-0133-7",
language = "English (US)",
volume = "8",
journal = "Translational Psychiatry",
issn = "2158-3188",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records

AU - Chen, Chia Yen

AU - Lee, Phil H.

AU - Castro, Victor M.

AU - Minnier, Jessica

AU - Charney, Alexander W.

AU - Stahl, Eli A.

AU - Ruderfer, Douglas M.

AU - Murphy, Shawn N.

AU - Gainer, Vivian

AU - Cai, Tianxi

AU - Jones, Ian

AU - Pato, Carlos N.

AU - Pato, Michele T.

AU - Landén, Mikael

AU - Sklar, Pamela

AU - Perlis, Roy H.

AU - Smoller, Jordan W.

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.

AB - Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. Am J Psychiatry 172:363-372, 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3330 BD cases and 3952 controls of European ancestry were used to estimate SNP-based heritability (h 2 g) and genetic correlation (r g) between EHR-based phenotype definitions and traditionally ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and three rule-based algorithms using codified EHR with decreasing levels of stringency-"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1968 coded-strict, 2581 coded-broad, 408 coded-broad-SV BD cases, and 3 952 controls. The estimated h 2 g were 0.24 (p = 0.015), 0.09 (p = 0.064), 0.13 (p = 0.003), 0.00 (p = 0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h 2 g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h 2 g) was 0.12 (p = 0.004). These h 2 g were lower or similar to the h 2 g observed by the ICCBD + PGCBD (0.23, p = 3.17E-80, total N = 33,181). However, the r g between ICCBD + PGCBD and the EHR-based cases were high for 95-NLP (0.66, p = 3.69 × 10-5), coded-strict (1.00, p = 2.40 × 10-4), and coded-broad (0.74, p = 8.11 × 10-7). The r g between EHR-based BD definitions ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.

UR - http://www.scopus.com/inward/record.url?scp=85045620951&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045620951&partnerID=8YFLogxK

U2 - 10.1038/s41398-018-0133-7

DO - 10.1038/s41398-018-0133-7

M3 - Article

VL - 8

JO - Translational Psychiatry

JF - Translational Psychiatry

SN - 2158-3188

IS - 1

M1 - 86

ER -