Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Ruijie Liu; Aliaksei Z. Holik; Shian Su; Natasha Jansz; Kelan Chen; Huei San Leong; Marnie E. Blewitt; Marie Liesse Asselin-Labat; Gordon K. Smyth; Matthew E. Ritchie

doi:10.1093/nar/gkv412

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Ruijie Liu, Aliaksei Z. Holik, Shian Su, Natasha Jansz, Kelan Chen, Huei San Leong, Marnie E. Blewitt, Marie Liesse Asselin-Labat, Gordon K. Smyth, Matthew E. Ritchie

Knight Cancer Institute

Research output: Contribution to journal › Article › peer-review

296 Scopus citations

Abstract

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

Original language	English (US)
Article number	e97
Journal	Nucleic acids research
Volume	43
Issue number	15
DOIs	https://doi.org/10.1093/nar/gkv412
State	Published - Apr 17 2015

ASJC Scopus subject areas

Genetics

Access to Document

10.1093/nar/gkv412

Cite this

@article{fa75615bad2342fc9194d0475b660561,

title = "Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses",

abstract = "Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.",

author = "Ruijie Liu and Holik, {Aliaksei Z.} and Shian Su and Natasha Jansz and Kelan Chen and Leong, {Huei San} and Blewitt, {Marnie E.} and Asselin-Labat, {Marie Liesse} and Smyth, {Gordon K.} and Ritchie, {Matthew E.}",

note = "Publisher Copyright: {\textcopyright} 2015 The Author(s).",

year = "2015",

month = apr,

day = "17",

doi = "10.1093/nar/gkv412",

language = "English (US)",

volume = "43",

journal = "Nucleic acids research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "15",

}

TY - JOUR

T1 - Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

AU - Liu, Ruijie

AU - Holik, Aliaksei Z.

AU - Su, Shian

AU - Jansz, Natasha

AU - Chen, Kelan

AU - Leong, Huei San

AU - Blewitt, Marnie E.

AU - Asselin-Labat, Marie Liesse

AU - Smyth, Gordon K.

AU - Ritchie, Matthew E.

PY - 2015/4/17

Y1 - 2015/4/17

N2 - Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

AB - Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package.

UR - http://www.scopus.com/inward/record.url?scp=84936076693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936076693&partnerID=8YFLogxK

U2 - 10.1093/nar/gkv412

DO - 10.1093/nar/gkv412

M3 - Article

C2 - 25925576

AN - SCOPUS:84936076693

SN - 0305-1048

VL - 43

JO - Nucleic acids research

JF - Nucleic acids research

IS - 15

M1 - e97

ER -

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this