The influence of disease categories on gene candidate predictions from model organism phenotypes

Anika Oellrich, Sebastian Koehler, Nicole Washington, Chris Mungall, Suzanna Lewis, Melissa Haendel, Peter N. Robinson, Damian Smedley, Mouse Genetic Project Sanger Mouse Genetic Project

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component. Results: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model. Conclusions: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.

Original languageEnglish (US)
Article numberS4
JournalJournal of Biomedical Semantics
Volume5
DOIs
StatePublished - 2014

Fingerprint

Genes
Phenotype
Exome
Technology

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications
  • Health Informatics

Cite this

Oellrich, A., Koehler, S., Washington, N., Mungall, C., Lewis, S., Haendel, M., ... Sanger Mouse Genetic Project, M. G. P. (2014). The influence of disease categories on gene candidate predictions from model organism phenotypes. Journal of Biomedical Semantics, 5, [S4]. https://doi.org/10.1186/2041-1480-5-S1-S4

The influence of disease categories on gene candidate predictions from model organism phenotypes. / Oellrich, Anika; Koehler, Sebastian; Washington, Nicole; Mungall, Chris; Lewis, Suzanna; Haendel, Melissa; Robinson, Peter N.; Smedley, Damian; Sanger Mouse Genetic Project, Mouse Genetic Project.

In: Journal of Biomedical Semantics, Vol. 5, S4, 2014.

Research output: Contribution to journalArticle

Oellrich, A, Koehler, S, Washington, N, Mungall, C, Lewis, S, Haendel, M, Robinson, PN, Smedley, D & Sanger Mouse Genetic Project, MGP 2014, 'The influence of disease categories on gene candidate predictions from model organism phenotypes', Journal of Biomedical Semantics, vol. 5, S4. https://doi.org/10.1186/2041-1480-5-S1-S4
Oellrich, Anika ; Koehler, Sebastian ; Washington, Nicole ; Mungall, Chris ; Lewis, Suzanna ; Haendel, Melissa ; Robinson, Peter N. ; Smedley, Damian ; Sanger Mouse Genetic Project, Mouse Genetic Project. / The influence of disease categories on gene candidate predictions from model organism phenotypes. In: Journal of Biomedical Semantics. 2014 ; Vol. 5.
@article{6d38aff5252a4fc2aff959a5322c4ac0,
title = "The influence of disease categories on gene candidate predictions from model organism phenotypes",
abstract = "Background: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component. Results: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model. Conclusions: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.",
author = "Anika Oellrich and Sebastian Koehler and Nicole Washington and Chris Mungall and Suzanna Lewis and Melissa Haendel and Robinson, {Peter N.} and Damian Smedley and {Sanger Mouse Genetic Project}, {Mouse Genetic Project}",
year = "2014",
doi = "10.1186/2041-1480-5-S1-S4",
language = "English (US)",
volume = "5",
journal = "Journal of Biomedical Semantics",
issn = "2041-1480",
publisher = "BioMed Central",

}

TY - JOUR

T1 - The influence of disease categories on gene candidate predictions from model organism phenotypes

AU - Oellrich, Anika

AU - Koehler, Sebastian

AU - Washington, Nicole

AU - Mungall, Chris

AU - Lewis, Suzanna

AU - Haendel, Melissa

AU - Robinson, Peter N.

AU - Smedley, Damian

AU - Sanger Mouse Genetic Project, Mouse Genetic Project

PY - 2014

Y1 - 2014

N2 - Background: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component. Results: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model. Conclusions: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.

AB - Background: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component. Results: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model. Conclusions: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.

UR - http://www.scopus.com/inward/record.url?scp=84938329175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938329175&partnerID=8YFLogxK

U2 - 10.1186/2041-1480-5-S1-S4

DO - 10.1186/2041-1480-5-S1-S4

M3 - Article

AN - SCOPUS:84938329175

VL - 5

JO - Journal of Biomedical Semantics

JF - Journal of Biomedical Semantics

SN - 2041-1480

M1 - S4

ER -