Stepwise group sparse regression (SGSR): Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors

In Sock Jang, Rodrigo Dienstmann, Adam Margolin, Justin Guinney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines ' in conjunction with modern statistical learning approaches - have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data ' and may yield unexpected or novel insights - this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss ' and often an improvement - in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.

Original languageEnglish (US)
Title of host publication20th Pacific Symposium on Biocomputing, PSB 2015
PublisherStanford University
Pages32-43
Number of pages12
StatePublished - 2015
Event20th Pacific Symposium on Biocomputing, PSB 2015 - Big Island, United States
Duration: Jan 4 2015Jan 8 2015

Other

Other20th Pacific Symposium on Biocomputing, PSB 2015
CountryUnited States
CityBig Island
Period1/4/151/8/15

Fingerprint

Genes
Cells
Aberrations
Linear regression
Pharmacogenetics
Proteins
Genomics

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering

Cite this

Jang, I. S., Dienstmann, R., Margolin, A., & Guinney, J. (2015). Stepwise group sparse regression (SGSR): Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. In 20th Pacific Symposium on Biocomputing, PSB 2015 (pp. 32-43). Stanford University.

Stepwise group sparse regression (SGSR) : Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. / Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam; Guinney, Justin.

20th Pacific Symposium on Biocomputing, PSB 2015. Stanford University, 2015. p. 32-43.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jang, IS, Dienstmann, R, Margolin, A & Guinney, J 2015, Stepwise group sparse regression (SGSR): Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. in 20th Pacific Symposium on Biocomputing, PSB 2015. Stanford University, pp. 32-43, 20th Pacific Symposium on Biocomputing, PSB 2015, Big Island, United States, 1/4/15.
Jang IS, Dienstmann R, Margolin A, Guinney J. Stepwise group sparse regression (SGSR): Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. In 20th Pacific Symposium on Biocomputing, PSB 2015. Stanford University. 2015. p. 32-43
Jang, In Sock ; Dienstmann, Rodrigo ; Margolin, Adam ; Guinney, Justin. / Stepwise group sparse regression (SGSR) : Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. 20th Pacific Symposium on Biocomputing, PSB 2015. Stanford University, 2015. pp. 32-43
@inproceedings{4e2372cb293f49889913bcf5b9574686,
title = "Stepwise group sparse regression (SGSR): Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors",
abstract = "Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines ' in conjunction with modern statistical learning approaches - have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data ' and may yield unexpected or novel insights - this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss ' and often an improvement - in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.",
author = "Jang, {In Sock} and Rodrigo Dienstmann and Adam Margolin and Justin Guinney",
year = "2015",
language = "English (US)",
pages = "32--43",
booktitle = "20th Pacific Symposium on Biocomputing, PSB 2015",
publisher = "Stanford University",

}

TY - GEN

T1 - Stepwise group sparse regression (SGSR)

T2 - Gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors

AU - Jang, In Sock

AU - Dienstmann, Rodrigo

AU - Margolin, Adam

AU - Guinney, Justin

PY - 2015

Y1 - 2015

N2 - Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines ' in conjunction with modern statistical learning approaches - have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data ' and may yield unexpected or novel insights - this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss ' and often an improvement - in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.

AB - Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines ' in conjunction with modern statistical learning approaches - have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data ' and may yield unexpected or novel insights - this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss ' and often an improvement - in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=84971350079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971350079&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 25592566

AN - SCOPUS:84971350079

SP - 32

EP - 43

BT - 20th Pacific Symposium on Biocomputing, PSB 2015

PB - Stanford University

ER -