Improved exome prioritization of disease genes through cross-species phenotype comparison

Peter N. Robinson, Sebastian Köhler, Anika Oellrich, Sanger Mouse Genetics, Kai Wang, Christopher J. Mungall, Suzanna E. Lewis, Nicole Washington, Sebastian Bauer, Dominik Seelow, Peter Krawitz, Christian Gilissen, Melissa Haendel, Damian Smedley

Research output: Contribution to journalArticle

145 Citations (Scopus)

Abstract

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

Original languageEnglish (US)
Pages (from-to)340-348
Number of pages9
JournalGenome Research
Volume24
Issue number2
DOIs
StatePublished - Feb 2014

Fingerprint

Exome
Phenotype
Genes
Virulence
Mutation
Computational Biology
Gene Frequency
ROC Curve
Area Under Curve
Genotype
Genome

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Robinson, P. N., Köhler, S., Oellrich, A., Genetics, S. M., Wang, K., Mungall, C. J., ... Smedley, D. (2014). Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Research, 24(2), 340-348. https://doi.org/10.1101/gr.160325.113

Improved exome prioritization of disease genes through cross-species phenotype comparison. / Robinson, Peter N.; Köhler, Sebastian; Oellrich, Anika; Genetics, Sanger Mouse; Wang, Kai; Mungall, Christopher J.; Lewis, Suzanna E.; Washington, Nicole; Bauer, Sebastian; Seelow, Dominik; Krawitz, Peter; Gilissen, Christian; Haendel, Melissa; Smedley, Damian.

In: Genome Research, Vol. 24, No. 2, 02.2014, p. 340-348.

Research output: Contribution to journalArticle

Robinson, PN, Köhler, S, Oellrich, A, Genetics, SM, Wang, K, Mungall, CJ, Lewis, SE, Washington, N, Bauer, S, Seelow, D, Krawitz, P, Gilissen, C, Haendel, M & Smedley, D 2014, 'Improved exome prioritization of disease genes through cross-species phenotype comparison', Genome Research, vol. 24, no. 2, pp. 340-348. https://doi.org/10.1101/gr.160325.113
Robinson PN, Köhler S, Oellrich A, Genetics SM, Wang K, Mungall CJ et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Research. 2014 Feb;24(2):340-348. https://doi.org/10.1101/gr.160325.113
Robinson, Peter N. ; Köhler, Sebastian ; Oellrich, Anika ; Genetics, Sanger Mouse ; Wang, Kai ; Mungall, Christopher J. ; Lewis, Suzanna E. ; Washington, Nicole ; Bauer, Sebastian ; Seelow, Dominik ; Krawitz, Peter ; Gilissen, Christian ; Haendel, Melissa ; Smedley, Damian. / Improved exome prioritization of disease genes through cross-species phenotype comparison. In: Genome Research. 2014 ; Vol. 24, No. 2. pp. 340-348.
@article{ce6c70310e744422b2ad9a5339969438,
title = "Improved exome prioritization of disease genes through cross-species phenotype comparison",
abstract = "Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83{\%} of samples, corresponding to an area under the ROC curve of >95{\%}. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.",
author = "Robinson, {Peter N.} and Sebastian K{\"o}hler and Anika Oellrich and Genetics, {Sanger Mouse} and Kai Wang and Mungall, {Christopher J.} and Lewis, {Suzanna E.} and Nicole Washington and Sebastian Bauer and Dominik Seelow and Peter Krawitz and Christian Gilissen and Melissa Haendel and Damian Smedley",
year = "2014",
month = "2",
doi = "10.1101/gr.160325.113",
language = "English (US)",
volume = "24",
pages = "340--348",
journal = "PCR Methods and Applications",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "2",

}

TY - JOUR

T1 - Improved exome prioritization of disease genes through cross-species phenotype comparison

AU - Robinson, Peter N.

AU - Köhler, Sebastian

AU - Oellrich, Anika

AU - Genetics, Sanger Mouse

AU - Wang, Kai

AU - Mungall, Christopher J.

AU - Lewis, Suzanna E.

AU - Washington, Nicole

AU - Bauer, Sebastian

AU - Seelow, Dominik

AU - Krawitz, Peter

AU - Gilissen, Christian

AU - Haendel, Melissa

AU - Smedley, Damian

PY - 2014/2

Y1 - 2014/2

N2 - Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

AB - Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

UR - http://www.scopus.com/inward/record.url?scp=84892959492&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892959492&partnerID=8YFLogxK

U2 - 10.1101/gr.160325.113

DO - 10.1101/gr.160325.113

M3 - Article

C2 - 24162188

AN - SCOPUS:84892959492

VL - 24

SP - 340

EP - 348

JO - PCR Methods and Applications

JF - PCR Methods and Applications

SN - 1088-9051

IS - 2

ER -