Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Ruijie Liu, Aliaksei Z. Holik, Shian Su, Natasha Jansz, Kelan Chen, Huei San Leong, Marnie E. Blewitt, Marie-Liesse Labat, Gordon K. Smyth, Matthew E. Ritchie

Research output: Contribution to journalArticle

63 Citations (Scopus)

Abstract

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

Original languageEnglish (US)
Article numbere97
JournalNucleic acids research
Volume43
Issue number15
DOIs
StatePublished - Apr 17 2015
Externally publishedYes

Fingerprint

RNA Sequence Analysis
RNA
Weights and Measures
Noise
Linear Models
Costs and Cost Analysis
Genes

ASJC Scopus subject areas

  • Genetics

Cite this

Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S., ... Ritchie, M. E. (2015). Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic acids research, 43(15), [e97]. https://doi.org/10.1093/nar/gkv412

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. / Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.

In: Nucleic acids research, Vol. 43, No. 15, e97, 17.04.2015.

Research output: Contribution to journalArticle

Liu, R, Holik, AZ, Su, S, Jansz, N, Chen, K, Leong, HS, Blewitt, ME, Labat, M-L, Smyth, GK & Ritchie, ME 2015, 'Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses', Nucleic acids research, vol. 43, no. 15, e97. https://doi.org/10.1093/nar/gkv412
Liu, Ruijie ; Holik, Aliaksei Z. ; Su, Shian ; Jansz, Natasha ; Chen, Kelan ; Leong, Huei San ; Blewitt, Marnie E. ; Labat, Marie-Liesse ; Smyth, Gordon K. ; Ritchie, Matthew E. / Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. In: Nucleic acids research. 2015 ; Vol. 43, No. 15.
@article{fa75615bad2342fc9194d0475b660561,
title = "Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses",
abstract = "Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.",
author = "Ruijie Liu and Holik, {Aliaksei Z.} and Shian Su and Natasha Jansz and Kelan Chen and Leong, {Huei San} and Blewitt, {Marnie E.} and Marie-Liesse Labat and Smyth, {Gordon K.} and Ritchie, {Matthew E.}",
year = "2015",
month = "4",
day = "17",
doi = "10.1093/nar/gkv412",
language = "English (US)",
volume = "43",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "15",

}

TY - JOUR

T1 - Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

AU - Liu, Ruijie

AU - Holik, Aliaksei Z.

AU - Su, Shian

AU - Jansz, Natasha

AU - Chen, Kelan

AU - Leong, Huei San

AU - Blewitt, Marnie E.

AU - Labat, Marie-Liesse

AU - Smyth, Gordon K.

AU - Ritchie, Matthew E.

PY - 2015/4/17

Y1 - 2015/4/17

N2 - Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

AB - Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

UR - http://www.scopus.com/inward/record.url?scp=84936076693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936076693&partnerID=8YFLogxK

U2 - 10.1093/nar/gkv412

DO - 10.1093/nar/gkv412

M3 - Article

VL - 43

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 15

M1 - e97

ER -