Rail-dbGaP

Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce

Abhinav Nellore, Christopher Wilks, Kasper D. Hansen, Jeffrey T. Leek, Ben Langmead

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.

Original languageEnglish (US)
Pages (from-to)2551-2553
Number of pages3
JournalBioinformatics
Volume32
Issue number16
DOIs
StatePublished - Aug 15 2016
Externally publishedYes

Fingerprint

MapReduce
RNA
Rails
Network protocols
Research Personnel
Running
Administrative Personnel
Web services
Software
Pipelines
Guidelines
Reproducibility
Expertise
Software Tools
Sequencing
Web Services
Resources

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Medicine(all)
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Rail-dbGaP : Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce. / Nellore, Abhinav; Wilks, Christopher; Hansen, Kasper D.; Leek, Jeffrey T.; Langmead, Ben.

In: Bioinformatics, Vol. 32, No. 16, 15.08.2016, p. 2551-2553.

Research output: Contribution to journalArticle

Nellore, Abhinav ; Wilks, Christopher ; Hansen, Kasper D. ; Leek, Jeffrey T. ; Langmead, Ben. / Rail-dbGaP : Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce. In: Bioinformatics. 2016 ; Vol. 32, No. 16. pp. 2551-2553.
@article{c206e59e45044acd9452ca4c51214c6e,
title = "Rail-dbGaP: Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce",
abstract = "Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40{\%} of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.",
author = "Abhinav Nellore and Christopher Wilks and Hansen, {Kasper D.} and Leek, {Jeffrey T.} and Ben Langmead",
year = "2016",
month = "8",
day = "15",
doi = "10.1093/bioinformatics/btw177",
language = "English (US)",
volume = "32",
pages = "2551--2553",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "16",

}

TY - JOUR

T1 - Rail-dbGaP

T2 - Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce

AU - Nellore, Abhinav

AU - Wilks, Christopher

AU - Hansen, Kasper D.

AU - Leek, Jeffrey T.

AU - Langmead, Ben

PY - 2016/8/15

Y1 - 2016/8/15

N2 - Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.

AB - Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.

UR - http://www.scopus.com/inward/record.url?scp=84983348760&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983348760&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw177

DO - 10.1093/bioinformatics/btw177

M3 - Article

VL - 32

SP - 2551

EP - 2553

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 16

ER -