The stream algorithm: Computationally efficient ridge-regression via Bayesian model averaging, and applications to pharmacogenomic prediction of cancer cell line sensitivity

Elias Chaibub Neto, In Sock Jang, Stephen H. Friend, Adam A. Margolin

Research output: Contribution to journalConference article

20 Scopus citations


Computational efficiency is important for learning algorithms operating in the "large p, small n" setting. In computational biology, the analysis of data sets containing tens of thousands of features ("large p"), but only a few hundred samples ("small n"), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition reparametrization of the ridge-regression model, replacing computationally expensive inversions of large p×p matrices by efficient inversions of small and diagonal n×n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.

Original languageEnglish (US)
Pages (from-to)27-38
Number of pages12
JournalPacific Symposium on Biocomputing
StatePublished - Jan 1 2014
Externally publishedYes
Event19th Pacific Symposium on Biocomputing, PSB 2014 - Kohala Coast, United States
Duration: Jan 3 2014Jan 7 2014



  • Bayesian model averaging
  • Cancer cell lines
  • Machine learning
  • Pharmacogenomic screens
  • Predictive modeling
  • Ridge-regression

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics

Cite this