Rail-RNA

scalable analysis of RNA-seq splicing and coverage

Abhinav Nellore, Leonardo Collado-Torres, Andrew E. Jaffe, José Alquicira-Hernández, Christopher Wilks, Jacob Pritt, James Morton, Jeffrey T. Leek, Ben Langmead

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Motivation: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples.

Results: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables.

Availability and Implementation: Rail-RNA is open-source software available at http://rail.bio.

Contacts: anellore@gmail.com or langmea@cs.jhu.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)4033-4040
Number of pages8
JournalBioinformatics (Oxford, England)
Volume33
Issue number24
DOIs
StatePublished - Dec 15 2017
Externally publishedYes

Fingerprint

RNA Splicing
RNA Sequence Analysis
RNA
Sequencing
Rails
Coverage
Exons
Software
Confounding Factors (Epidemiology)
Computational Biology
Output
Bioinformatics
Alignment
Web services
Statistical methods
Open Source Software
Confounding
Availability

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Nellore, A., Collado-Torres, L., Jaffe, A. E., Alquicira-Hernández, J., Wilks, C., Pritt, J., ... Langmead, B. (2017). Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics (Oxford, England), 33(24), 4033-4040. https://doi.org/10.1093/bioinformatics/btw575

Rail-RNA : scalable analysis of RNA-seq splicing and coverage. / Nellore, Abhinav; Collado-Torres, Leonardo; Jaffe, Andrew E.; Alquicira-Hernández, José; Wilks, Christopher; Pritt, Jacob; Morton, James; Leek, Jeffrey T.; Langmead, Ben.

In: Bioinformatics (Oxford, England), Vol. 33, No. 24, 15.12.2017, p. 4033-4040.

Research output: Contribution to journalArticle

Nellore, A, Collado-Torres, L, Jaffe, AE, Alquicira-Hernández, J, Wilks, C, Pritt, J, Morton, J, Leek, JT & Langmead, B 2017, 'Rail-RNA: scalable analysis of RNA-seq splicing and coverage', Bioinformatics (Oxford, England), vol. 33, no. 24, pp. 4033-4040. https://doi.org/10.1093/bioinformatics/btw575
Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J et al. Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics (Oxford, England). 2017 Dec 15;33(24):4033-4040. https://doi.org/10.1093/bioinformatics/btw575
Nellore, Abhinav ; Collado-Torres, Leonardo ; Jaffe, Andrew E. ; Alquicira-Hernández, José ; Wilks, Christopher ; Pritt, Jacob ; Morton, James ; Leek, Jeffrey T. ; Langmead, Ben. / Rail-RNA : scalable analysis of RNA-seq splicing and coverage. In: Bioinformatics (Oxford, England). 2017 ; Vol. 33, No. 24. pp. 4033-4040.
@article{ede4ae5c378644b997c9d45387105583,
title = "Rail-RNA: scalable analysis of RNA-seq splicing and coverage",
abstract = "Motivation: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples.Results: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables.Availability and Implementation: Rail-RNA is open-source software available at http://rail.bio.Contacts: anellore@gmail.com or langmea@cs.jhu.edu.Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Abhinav Nellore and Leonardo Collado-Torres and Jaffe, {Andrew E.} and Jos{\'e} Alquicira-Hern{\'a}ndez and Christopher Wilks and Jacob Pritt and James Morton and Leek, {Jeffrey T.} and Ben Langmead",
year = "2017",
month = "12",
day = "15",
doi = "10.1093/bioinformatics/btw575",
language = "English (US)",
volume = "33",
pages = "4033--4040",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "24",

}

TY - JOUR

T1 - Rail-RNA

T2 - scalable analysis of RNA-seq splicing and coverage

AU - Nellore, Abhinav

AU - Collado-Torres, Leonardo

AU - Jaffe, Andrew E.

AU - Alquicira-Hernández, José

AU - Wilks, Christopher

AU - Pritt, Jacob

AU - Morton, James

AU - Leek, Jeffrey T.

AU - Langmead, Ben

PY - 2017/12/15

Y1 - 2017/12/15

N2 - Motivation: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples.Results: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables.Availability and Implementation: Rail-RNA is open-source software available at http://rail.bio.Contacts: anellore@gmail.com or langmea@cs.jhu.edu.Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples.Results: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables.Availability and Implementation: Rail-RNA is open-source software available at http://rail.bio.Contacts: anellore@gmail.com or langmea@cs.jhu.edu.Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85044381193&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044381193&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw575

DO - 10.1093/bioinformatics/btw575

M3 - Article

VL - 33

SP - 4033

EP - 4040

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 24

ER -