Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt

Steffen Durinck, Paul Spellman, Ewan Birney, Wolfgang Huber

Research output: Contribution to journalArticle

559 Citations (Scopus)

Abstract

Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.

Original languageEnglish (US)
Pages (from-to)1184-1191
Number of pages8
JournalNature Protocols
Volume4
Issue number8
DOIs
StatePublished - 2009
Externally publishedYes

Fingerprint

Computational Biology
DNA Copy Number Variations
Databases
Bioinformatics
Proteins
Molecules
Data integration
DNA sequences
Biological systems
Web services
Messenger RNA
Genes
Datasets
Experiments

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt. / Durinck, Steffen; Spellman, Paul; Birney, Ewan; Huber, Wolfgang.

In: Nature Protocols, Vol. 4, No. 8, 2009, p. 1184-1191.

Research output: Contribution to journalArticle

Durinck, Steffen ; Spellman, Paul ; Birney, Ewan ; Huber, Wolfgang. / Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt. In: Nature Protocols. 2009 ; Vol. 4, No. 8. pp. 1184-1191.
@article{b1ce7da8cda04041abdb0f1a2a0e7a37,
title = "Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt",
abstract = "Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.",
author = "Steffen Durinck and Paul Spellman and Ewan Birney and Wolfgang Huber",
year = "2009",
doi = "10.1038/nprot.2009.97",
language = "English (US)",
volume = "4",
pages = "1184--1191",
journal = "Nature Protocols",
issn = "1754-2189",
publisher = "Nature Publishing Group",
number = "8",

}

TY - JOUR

T1 - Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt

AU - Durinck, Steffen

AU - Spellman, Paul

AU - Birney, Ewan

AU - Huber, Wolfgang

PY - 2009

Y1 - 2009

N2 - Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.

AB - Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.

UR - http://www.scopus.com/inward/record.url?scp=68449101067&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68449101067&partnerID=8YFLogxK

U2 - 10.1038/nprot.2009.97

DO - 10.1038/nprot.2009.97

M3 - Article

C2 - 19617889

AN - SCOPUS:68449101067

VL - 4

SP - 1184

EP - 1191

JO - Nature Protocols

JF - Nature Protocols

SN - 1754-2189

IS - 8

ER -