Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt

Steffen Durinck, Paul T. Spellman, Ewan Birney, Wolfgang Huber

Research output: Contribution to journalArticlepeer-review

2129 Scopus citations

Abstract

Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.

Original languageEnglish (US)
Pages (from-to)1184-1191
Number of pages8
JournalNature protocols
Volume4
Issue number8
DOIs
StatePublished - 2009
Externally publishedYes

ASJC Scopus subject areas

  • General Biochemistry, Genetics and Molecular Biology

Fingerprint

Dive into the research topics of 'Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt'. Together they form a unique fingerprint.

Cite this