Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer

Adam A. Margolin; Erhan Bilal; Erich Huang; Thea C. Norman; Lars Ottestad; Brigham H. Mecham; Ben Sauerwine; Michael R. Kellen; Lara M. Mangravite; Matthew D. Furia; Hans Kristian Moen Vollan; Oscar M. Rueda; Justin Guinney; Nicole A. Deflaux; Bruce Hoff; Xavier Schildwachter; Hege G. Russnes; Daehoon Park; Veronica O. Vang; Tyler Pirtle; Lamia Youseff; Craig Citro; Christina Curtis; Vessela N. Kristensen; Joseph Hellerstein; Stephen H. Friend; Gustavo Stolovitzky; Samuel Aparicio; Carlos Caldas; Anne Lise Børresen-Dale

doi:10.1126/scitranslmed.3006112

Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer

Adam A. Margolin, Erhan Bilal, Erich Huang, Thea C. Norman, Lars Ottestad, Brigham H. Mecham, Ben Sauerwine, Michael R. Kellen, Lara M. Mangravite, Matthew D. Furia, Hans Kristian Moen Vollan, Oscar M. Rueda, Justin Guinney, Nicole A. Deflaux, Bruce Hoff, Xavier Schildwachter, Hege G. Russnes, Daehoon Park, Veronica O. Vang, Tyler PirtleLamia Youseff, Craig Citro, Christina Curtis, Vessela N. Kristensen, Joseph Hellerstein, Stephen H. Friend, Gustavo Stolovitzky, Samuel Aparicio, Carlos Caldas, Anne Lise Børresen-Dale

Research output: Contribution to journal › Article › peer-review

89 Scopus citations

Abstract

Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.

Original language	English (US)
Article number	181re1
Journal	Science translational medicine
Volume	5
Issue number	181
DOIs	https://doi.org/10.1126/scitranslmed.3006112
State	Published - Apr 17 2013
Externally published	Yes

ASJC Scopus subject areas

General Medicine

Access to Document

10.1126/scitranslmed.3006112

Cite this

Margolin, A. A., Bilal, E., Huang, E., Norman, T. C., Ottestad, L., Mecham, B. H., Sauerwine, B., Kellen, M. R., Mangravite, L. M., Furia, M. D., Vollan, H. K. M., Rueda, O. M., Guinney, J., Deflaux, N. A., Hoff, B., Schildwachter, X., Russnes, H. G., Park, D., Vang, V. O., ... Børresen-Dale, A. L. (2013). Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Science translational medicine, 5(181), Article 181re1. https://doi.org/10.1126/scitranslmed.3006112

Margolin, AA, Bilal, E, Huang, E, Norman, TC, Ottestad, L, Mecham, BH, Sauerwine, B, Kellen, MR, Mangravite, LM, Furia, MD, Vollan, HKM, Rueda, OM, Guinney, J, Deflaux, NA, Hoff, B, Schildwachter, X, Russnes, HG, Park, D, Vang, VO, Pirtle, T, Youseff, L, Citro, C, Curtis, C, Kristensen, VN, Hellerstein, J, Friend, SH, Stolovitzky, G, Aparicio, S, Caldas, C & Børresen-Dale, AL 2013, 'Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer', Science translational medicine, vol. 5, no. 181, 181re1. https://doi.org/10.1126/scitranslmed.3006112

@article{b1fb7ba704644de89a07f5d15ae67a1b,

title = "Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer",

abstract = "Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.",

author = "Margolin, {Adam A.} and Erhan Bilal and Erich Huang and Norman, {Thea C.} and Lars Ottestad and Mecham, {Brigham H.} and Ben Sauerwine and Kellen, {Michael R.} and Mangravite, {Lara M.} and Furia, {Matthew D.} and Vollan, {Hans Kristian Moen} and Rueda, {Oscar M.} and Justin Guinney and Deflaux, {Nicole A.} and Bruce Hoff and Xavier Schildwachter and Russnes, {Hege G.} and Daehoon Park and Vang, {Veronica O.} and Tyler Pirtle and Lamia Youseff and Craig Citro and Christina Curtis and Kristensen, {Vessela N.} and Joseph Hellerstein and Friend, {Stephen H.} and Gustavo Stolovitzky and Samuel Aparicio and Carlos Caldas and B{\o}rresen-Dale, {Anne Lise}",

year = "2013",

month = apr,

day = "17",

doi = "10.1126/scitranslmed.3006112",

language = "English (US)",

volume = "5",

journal = "Science translational medicine",

issn = "1946-6234",

publisher = "American Association for the Advancement of Science",

number = "181",

}

TY - JOUR

T1 - Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer

AU - Margolin, Adam A.

AU - Bilal, Erhan

AU - Huang, Erich

AU - Norman, Thea C.

AU - Ottestad, Lars

AU - Mecham, Brigham H.

AU - Sauerwine, Ben

AU - Kellen, Michael R.

AU - Mangravite, Lara M.

AU - Furia, Matthew D.

AU - Vollan, Hans Kristian Moen

AU - Rueda, Oscar M.

AU - Guinney, Justin

AU - Deflaux, Nicole A.

AU - Hoff, Bruce

AU - Schildwachter, Xavier

AU - Russnes, Hege G.

AU - Park, Daehoon

AU - Vang, Veronica O.

AU - Pirtle, Tyler

AU - Youseff, Lamia

AU - Citro, Craig

AU - Curtis, Christina

AU - Kristensen, Vessela N.

AU - Hellerstein, Joseph

AU - Friend, Stephen H.

AU - Stolovitzky, Gustavo

AU - Aparicio, Samuel

AU - Caldas, Carlos

AU - Børresen-Dale, Anne Lise

PY - 2013/4/17

Y1 - 2013/4/17

N2 - Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.

AB - Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.

UR - http://www.scopus.com/inward/record.url?scp=84877765675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877765675&partnerID=8YFLogxK

U2 - 10.1126/scitranslmed.3006112

DO - 10.1126/scitranslmed.3006112

M3 - Article

C2 - 23596205

AN - SCOPUS:84877765675

SN - 1946-6234

VL - 5

JO - Science translational medicine

JF - Science translational medicine

IS - 181

M1 - 181re1

ER -

Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this