Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data

In Sock Jang; Elias Chaibub Neto; Justin Guinney; Stephen H. Friend; Adam A. Margolin

Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data

In Sock Jang, Elias Chaibub Neto, Justin Guinney, Stephen H. Friend, Adam A. Margolin

Research output: Contribution to journal › Conference article › peer-review

Abstract

Large-scale pharmacogenomic screens of cancer cell lines have emerged as an attractive pre-clinical system for identifying tumor genetic subtypes with selective sensitivity to targeted therapeutic strategies. Application of modern machine learning approaches to pharmacogenomic datasets have demonstrated the ability to infer genomic predictors of compound sensitivity. Such modeling approaches entail many analytical design choices; however, a systematic study evaluating the relative performance attributable to each design choice is not yet available. In this work, we evaluated over 110,000 different models, based on a multifactorial experimental design testing systematic combinations of modeling factors within several categories of modeling choices, including: type of algorithm, type of molecular feature data, compound being predicted, method of summarizing compound sensitivity values, and whether predictions are based on discretized or continuous response values. Our results suggest that model input data (type of molecular features and choice of compound) are the primary factors explaining model performance, followed by choice of algorithm. Our results also provide a statistically principled set of recommended modeling guidelines, including: using elastic net or ridge regression with input features from all genomic profiling platforms, most importantly, gene expression features, to predict continuous-valued sensitivity scores summarized using the area under the dose response curve, with pathway targeted compounds most likely to yield the most accurate predictors. In addition, our study provides a publicly available resource of all modeling results, an open source code base, and experimental design for researchers throughout the community to build on our results and assess novel methodologies or applications in related predictive modeling problems.

Original language	English (US)
Pages (from-to)	63-74
Number of pages	12
Journal	Pacific Symposium on Biocomputing
State	Published - 2014
Externally published	Yes
Event	19th Pacific Symposium on Biocomputing, PSB 2014 - Kohala Coast, United States Duration: Jan 3 2014 → Jan 7 2014

Keywords

Cancer cell lines
Machine learning
Pharmacogenomics
Predictive modeling

ASJC Scopus subject areas

Biomedical Engineering
Computational Theory and Mathematics

Cite this

@article{a9076bc53fe6456a8b338bf9a8eace02,

title = "Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data",

abstract = "Large-scale pharmacogenomic screens of cancer cell lines have emerged as an attractive pre-clinical system for identifying tumor genetic subtypes with selective sensitivity to targeted therapeutic strategies. Application of modern machine learning approaches to pharmacogenomic datasets have demonstrated the ability to infer genomic predictors of compound sensitivity. Such modeling approaches entail many analytical design choices; however, a systematic study evaluating the relative performance attributable to each design choice is not yet available. In this work, we evaluated over 110,000 different models, based on a multifactorial experimental design testing systematic combinations of modeling factors within several categories of modeling choices, including: type of algorithm, type of molecular feature data, compound being predicted, method of summarizing compound sensitivity values, and whether predictions are based on discretized or continuous response values. Our results suggest that model input data (type of molecular features and choice of compound) are the primary factors explaining model performance, followed by choice of algorithm. Our results also provide a statistically principled set of recommended modeling guidelines, including: using elastic net or ridge regression with input features from all genomic profiling platforms, most importantly, gene expression features, to predict continuous-valued sensitivity scores summarized using the area under the dose response curve, with pathway targeted compounds most likely to yield the most accurate predictors. In addition, our study provides a publicly available resource of all modeling results, an open source code base, and experimental design for researchers throughout the community to build on our results and assess novel methodologies or applications in related predictive modeling problems.",

keywords = "Cancer cell lines, Machine learning, Pharmacogenomics, Predictive modeling",

author = "Jang, {In Sock} and Neto, {Elias Chaibub} and Justin Guinney and Friend, {Stephen H.} and Margolin, {Adam A.}",

year = "2014",

language = "English (US)",

pages = "63--74",

journal = "Pacific Symposium on Biocomputing",

issn = "2335-6928",

publisher = "World Scientific Publishing Co., Inc.",

note = "19th Pacific Symposium on Biocomputing, PSB 2014 ; Conference date: 03-01-2014 Through 07-01-2014",

}

TY - JOUR

T1 - Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data

AU - Jang, In Sock

AU - Neto, Elias Chaibub

AU - Guinney, Justin

AU - Friend, Stephen H.

AU - Margolin, Adam A.

PY - 2014

Y1 - 2014

N2 - Large-scale pharmacogenomic screens of cancer cell lines have emerged as an attractive pre-clinical system for identifying tumor genetic subtypes with selective sensitivity to targeted therapeutic strategies. Application of modern machine learning approaches to pharmacogenomic datasets have demonstrated the ability to infer genomic predictors of compound sensitivity. Such modeling approaches entail many analytical design choices; however, a systematic study evaluating the relative performance attributable to each design choice is not yet available. In this work, we evaluated over 110,000 different models, based on a multifactorial experimental design testing systematic combinations of modeling factors within several categories of modeling choices, including: type of algorithm, type of molecular feature data, compound being predicted, method of summarizing compound sensitivity values, and whether predictions are based on discretized or continuous response values. Our results suggest that model input data (type of molecular features and choice of compound) are the primary factors explaining model performance, followed by choice of algorithm. Our results also provide a statistically principled set of recommended modeling guidelines, including: using elastic net or ridge regression with input features from all genomic profiling platforms, most importantly, gene expression features, to predict continuous-valued sensitivity scores summarized using the area under the dose response curve, with pathway targeted compounds most likely to yield the most accurate predictors. In addition, our study provides a publicly available resource of all modeling results, an open source code base, and experimental design for researchers throughout the community to build on our results and assess novel methodologies or applications in related predictive modeling problems.

AB - Large-scale pharmacogenomic screens of cancer cell lines have emerged as an attractive pre-clinical system for identifying tumor genetic subtypes with selective sensitivity to targeted therapeutic strategies. Application of modern machine learning approaches to pharmacogenomic datasets have demonstrated the ability to infer genomic predictors of compound sensitivity. Such modeling approaches entail many analytical design choices; however, a systematic study evaluating the relative performance attributable to each design choice is not yet available. In this work, we evaluated over 110,000 different models, based on a multifactorial experimental design testing systematic combinations of modeling factors within several categories of modeling choices, including: type of algorithm, type of molecular feature data, compound being predicted, method of summarizing compound sensitivity values, and whether predictions are based on discretized or continuous response values. Our results suggest that model input data (type of molecular features and choice of compound) are the primary factors explaining model performance, followed by choice of algorithm. Our results also provide a statistically principled set of recommended modeling guidelines, including: using elastic net or ridge regression with input features from all genomic profiling platforms, most importantly, gene expression features, to predict continuous-valued sensitivity scores summarized using the area under the dose response curve, with pathway targeted compounds most likely to yield the most accurate predictors. In addition, our study provides a publicly available resource of all modeling results, an open source code base, and experimental design for researchers throughout the community to build on our results and assess novel methodologies or applications in related predictive modeling problems.

KW - Cancer cell lines

KW - Machine learning

KW - Pharmacogenomics

KW - Predictive modeling

UR - http://www.scopus.com/inward/record.url?scp=84905489545&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905489545&partnerID=8YFLogxK

M3 - Conference article

C2 - 24297534

AN - SCOPUS:84905489545

SN - 2335-6928

SP - 63

EP - 74

JO - Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing

T2 - 19th Pacific Symposium on Biocomputing, PSB 2014

Y2 - 3 January 2014 through 7 January 2014

ER -

Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this