RNA-seq mixology: Designing realistic control experiments to compare protocols and analysis methods

Aliaksei Z. Holik, Charity W. Law, Ruijie Liu, Zeya Wang, Wenyi Wang, Jaeil Ahn, Marie-Liesse Labat, Gordon K. Smyth, Matthew E. Ritchie

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.

Original languageEnglish (US)
Article number1063
JournalNucleic acids research
Volume45
Issue number5
DOIs
StatePublished - Mar 17 2017
Externally publishedYes

Fingerprint

Benchmarking
RNA
RNA Sequence Analysis
Noise
Small Nucleolar RNA
Small Nuclear RNA
Gene Expression
Poly A
Workflow
Genomics
Introns
Libraries
Cell Culture Techniques
Weights and Measures
Cell Line
Research
Neoplasms

ASJC Scopus subject areas

  • Genetics

Cite this

Holik, A. Z., Law, C. W., Liu, R., Wang, Z., Wang, W., Ahn, J., ... Ritchie, M. E. (2017). RNA-seq mixology: Designing realistic control experiments to compare protocols and analysis methods. Nucleic acids research, 45(5), [1063]. https://doi.org/10.1093/nar/gkw1063

RNA-seq mixology : Designing realistic control experiments to compare protocols and analysis methods. / Holik, Aliaksei Z.; Law, Charity W.; Liu, Ruijie; Wang, Zeya; Wang, Wenyi; Ahn, Jaeil; Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.

In: Nucleic acids research, Vol. 45, No. 5, 1063, 17.03.2017.

Research output: Contribution to journalArticle

Holik, AZ, Law, CW, Liu, R, Wang, Z, Wang, W, Ahn, J, Labat, M-L, Smyth, GK & Ritchie, ME 2017, 'RNA-seq mixology: Designing realistic control experiments to compare protocols and analysis methods', Nucleic acids research, vol. 45, no. 5, 1063. https://doi.org/10.1093/nar/gkw1063
Holik, Aliaksei Z. ; Law, Charity W. ; Liu, Ruijie ; Wang, Zeya ; Wang, Wenyi ; Ahn, Jaeil ; Labat, Marie-Liesse ; Smyth, Gordon K. ; Ritchie, Matthew E. / RNA-seq mixology : Designing realistic control experiments to compare protocols and analysis methods. In: Nucleic acids research. 2017 ; Vol. 45, No. 5.
@article{ccd6b67e70c24711a604a93a59f79a59,
title = "RNA-seq mixology: Designing realistic control experiments to compare protocols and analysis methods",
abstract = "Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.",
author = "Holik, {Aliaksei Z.} and Law, {Charity W.} and Ruijie Liu and Zeya Wang and Wenyi Wang and Jaeil Ahn and Marie-Liesse Labat and Smyth, {Gordon K.} and Ritchie, {Matthew E.}",
year = "2017",
month = "3",
day = "17",
doi = "10.1093/nar/gkw1063",
language = "English (US)",
volume = "45",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - RNA-seq mixology

T2 - Designing realistic control experiments to compare protocols and analysis methods

AU - Holik, Aliaksei Z.

AU - Law, Charity W.

AU - Liu, Ruijie

AU - Wang, Zeya

AU - Wang, Wenyi

AU - Ahn, Jaeil

AU - Labat, Marie-Liesse

AU - Smyth, Gordon K.

AU - Ritchie, Matthew E.

PY - 2017/3/17

Y1 - 2017/3/17

N2 - Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.

AB - Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA samplemultiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.

UR - http://www.scopus.com/inward/record.url?scp=85018289156&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018289156&partnerID=8YFLogxK

U2 - 10.1093/nar/gkw1063

DO - 10.1093/nar/gkw1063

M3 - Article

C2 - 27899618

AN - SCOPUS:85018289156

VL - 45

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 5

M1 - 1063

ER -