Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

Erhan Bilal, Janusz Dutkowski, Justin Guinney, In Sock Jang, Benjamin A. Logsdon, Gaurav Pandey, Benjamin A. Sauerwine, Yishai Shimoni, Hans Kristian Moen Vollan, Brigham H. Mecham, Oscar M. Rueda, Jorg Tost, Christina Curtis, Mariano J. Alvarez, Vessela N. Kristensen, Samuel Aparicio, Anne Lise Børresen-Dale, Carlos Caldas, Andrea Califano, Stephen H. Friend & 4 others Trey Ideker, Eric E. Schadt, Gustavo A. Stolovitzky, Adam Margolin

Research output: Contribution to journalArticle

47 Citations (Scopus)

Abstract

Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.

Original languageEnglish (US)
Article numbere1003047
JournalPLoS Computational Biology
Volume9
Issue number5
DOIs
StatePublished - May 2013
Externally publishedYes

Fingerprint

Survival Analysis
Breast Cancer
breast neoplasms
cancer
Breast Neoplasms
Prognosis
Modeling
modeling
Ensemble
Combined Method
prognosis
Profiling
Prior Knowledge
Optimization Model
Large Data Sets
Model
Genomics
Neoplasms
Cancer
Machine Learning

ASJC Scopus subject areas

  • Cellular and Molecular Neuroscience
  • Ecology
  • Molecular Biology
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Computational Theory and Mathematics

Cite this

Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling. / Bilal, Erhan; Dutkowski, Janusz; Guinney, Justin; Jang, In Sock; Logsdon, Benjamin A.; Pandey, Gaurav; Sauerwine, Benjamin A.; Shimoni, Yishai; Moen Vollan, Hans Kristian; Mecham, Brigham H.; Rueda, Oscar M.; Tost, Jorg; Curtis, Christina; Alvarez, Mariano J.; Kristensen, Vessela N.; Aparicio, Samuel; Børresen-Dale, Anne Lise; Caldas, Carlos; Califano, Andrea; Friend, Stephen H.; Ideker, Trey; Schadt, Eric E.; Stolovitzky, Gustavo A.; Margolin, Adam.

In: PLoS Computational Biology, Vol. 9, No. 5, e1003047, 05.2013.

Research output: Contribution to journalArticle

Bilal, E, Dutkowski, J, Guinney, J, Jang, IS, Logsdon, BA, Pandey, G, Sauerwine, BA, Shimoni, Y, Moen Vollan, HK, Mecham, BH, Rueda, OM, Tost, J, Curtis, C, Alvarez, MJ, Kristensen, VN, Aparicio, S, Børresen-Dale, AL, Caldas, C, Califano, A, Friend, SH, Ideker, T, Schadt, EE, Stolovitzky, GA & Margolin, A 2013, 'Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling', PLoS Computational Biology, vol. 9, no. 5, e1003047. https://doi.org/10.1371/journal.pcbi.1003047
Bilal, Erhan ; Dutkowski, Janusz ; Guinney, Justin ; Jang, In Sock ; Logsdon, Benjamin A. ; Pandey, Gaurav ; Sauerwine, Benjamin A. ; Shimoni, Yishai ; Moen Vollan, Hans Kristian ; Mecham, Brigham H. ; Rueda, Oscar M. ; Tost, Jorg ; Curtis, Christina ; Alvarez, Mariano J. ; Kristensen, Vessela N. ; Aparicio, Samuel ; Børresen-Dale, Anne Lise ; Caldas, Carlos ; Califano, Andrea ; Friend, Stephen H. ; Ideker, Trey ; Schadt, Eric E. ; Stolovitzky, Gustavo A. ; Margolin, Adam. / Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling. In: PLoS Computational Biology. 2013 ; Vol. 9, No. 5.
@article{24f176be7d264847a1c5c31103bbf319,
title = "Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling",
abstract = "Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.",
author = "Erhan Bilal and Janusz Dutkowski and Justin Guinney and Jang, {In Sock} and Logsdon, {Benjamin A.} and Gaurav Pandey and Sauerwine, {Benjamin A.} and Yishai Shimoni and {Moen Vollan}, {Hans Kristian} and Mecham, {Brigham H.} and Rueda, {Oscar M.} and Jorg Tost and Christina Curtis and Alvarez, {Mariano J.} and Kristensen, {Vessela N.} and Samuel Aparicio and B{\o}rresen-Dale, {Anne Lise} and Carlos Caldas and Andrea Califano and Friend, {Stephen H.} and Trey Ideker and Schadt, {Eric E.} and Stolovitzky, {Gustavo A.} and Adam Margolin",
year = "2013",
month = "5",
doi = "10.1371/journal.pcbi.1003047",
language = "English (US)",
volume = "9",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "5",

}

TY - JOUR

T1 - Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

AU - Bilal, Erhan

AU - Dutkowski, Janusz

AU - Guinney, Justin

AU - Jang, In Sock

AU - Logsdon, Benjamin A.

AU - Pandey, Gaurav

AU - Sauerwine, Benjamin A.

AU - Shimoni, Yishai

AU - Moen Vollan, Hans Kristian

AU - Mecham, Brigham H.

AU - Rueda, Oscar M.

AU - Tost, Jorg

AU - Curtis, Christina

AU - Alvarez, Mariano J.

AU - Kristensen, Vessela N.

AU - Aparicio, Samuel

AU - Børresen-Dale, Anne Lise

AU - Caldas, Carlos

AU - Califano, Andrea

AU - Friend, Stephen H.

AU - Ideker, Trey

AU - Schadt, Eric E.

AU - Stolovitzky, Gustavo A.

AU - Margolin, Adam

PY - 2013/5

Y1 - 2013/5

N2 - Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.

AB - Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.

UR - http://www.scopus.com/inward/record.url?scp=84877734926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877734926&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1003047

DO - 10.1371/journal.pcbi.1003047

M3 - Article

VL - 9

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 5

M1 - e1003047

ER -