The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity

Elias Chaibub Neto; In Sock Jang; Stephen H. Friend; Adam A. Margolin

The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity

Elias Chaibub Neto, In Sock Jang, Stephen H. Friend, Adam A. Margolin

Research output: Contribution to journal › Conference article › peer-review

Abstract

Computational efficiency is important for learning algorithms operating in the "large p, small n" setting. In computational biology, the analysis of data sets containing tens of thousands of features ("large p"), but only a few hundred samples ("small n"), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition reparametrization of the ridge-regression model, replacing computationally expensive inversions of large p×p matrices by efficient inversions of small and diagonal n×n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.

Original language	English (US)
Pages (from-to)	27-38
Number of pages	12
Journal	Pacific Symposium on Biocomputing
State	Published - 2014
Externally published	Yes
Event	19th Pacific Symposium on Biocomputing, PSB 2014 - Kohala Coast, United States Duration: Jan 3 2014 → Jan 7 2014

Keywords

Bayesian model averaging
Cancer cell lines
Machine learning
Pharmacogenomic screens
Predictive modeling
Ridge-regression

ASJC Scopus subject areas

Biomedical Engineering
Computational Theory and Mathematics

Cite this

The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity. / Neto, Elias Chaibub; Jang, In Sock; Friend, Stephen H. et al.
In: Pacific Symposium on Biocomputing, 2014, p. 27-38.

Research output: Contribution to journal › Conference article › peer-review

@article{032dce0db7424b30938f5317f910d86c,

title = "The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity",

abstract = "Computational efficiency is important for learning algorithms operating in the {"}large p, small n{"} setting. In computational biology, the analysis of data sets containing tens of thousands of features ({"}large p{"}), but only a few hundred samples ({"}small n{"}), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition reparametrization of the ridge-regression model, replacing computationally expensive inversions of large p×p matrices by efficient inversions of small and diagonal n×n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.",

keywords = "Bayesian model averaging, Cancer cell lines, Machine learning, Pharmacogenomic screens, Predictive modeling, Ridge-regression",

author = "Neto, {Elias Chaibub} and Jang, {In Sock} and Friend, {Stephen H.} and Margolin, {Adam A.}",

year = "2014",

language = "English (US)",

pages = "27--38",

journal = "Pacific Symposium on Biocomputing",

issn = "2335-6928",

publisher = "World Scientific Publishing Co., Inc.",

note = "19th Pacific Symposium on Biocomputing, PSB 2014 ; Conference date: 03-01-2014 Through 07-01-2014",

}

TY - JOUR

T1 - The stream algorithm

T2 - 19th Pacific Symposium on Biocomputing, PSB 2014

AU - Neto, Elias Chaibub

AU - Jang, In Sock

AU - Friend, Stephen H.

AU - Margolin, Adam A.

PY - 2014

Y1 - 2014

N2 - Computational efficiency is important for learning algorithms operating in the "large p, small n" setting. In computational biology, the analysis of data sets containing tens of thousands of features ("large p"), but only a few hundred samples ("small n"), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition reparametrization of the ridge-regression model, replacing computationally expensive inversions of large p×p matrices by efficient inversions of small and diagonal n×n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.

AB - Computational efficiency is important for learning algorithms operating in the "large p, small n" setting. In computational biology, the analysis of data sets containing tens of thousands of features ("large p"), but only a few hundred samples ("small n"), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition reparametrization of the ridge-regression model, replacing computationally expensive inversions of large p×p matrices by efficient inversions of small and diagonal n×n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.

KW - Bayesian model averaging

KW - Cancer cell lines

KW - Machine learning

KW - Pharmacogenomic screens

KW - Predictive modeling

KW - Ridge-regression

UR - http://www.scopus.com/inward/record.url?scp=84905872972&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905872972&partnerID=8YFLogxK

M3 - Conference article

C2 - 24297531

AN - SCOPUS:84905872972

SN - 2335-6928

SP - 27

EP - 38

JO - Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing

Y2 - 3 January 2014 through 7 January 2014

ER -

The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this