A robust prognostic signature for hormone-positive node-negative breast cancer

Obi L. Griffith, François Pepin, Oana M. Enache, Laura Heiser, Eric A. Collisson, Paul Spellman, Joe Gray

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods: We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results: Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.Conclusions: RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

Original languageEnglish (US)
Article number92
JournalGenome Medicine
Volume5
Issue number10
DOIs
StatePublished - Oct 11 2013

Fingerprint

Hormones
Breast Neoplasms
Recurrence
Genes
Survival Analysis
Drug Therapy
Kaplan-Meier Estimate
Adjuvant Chemotherapy
ROC Curve
Area Under Curve
Cluster Analysis
Databases
Costs and Cost Analysis
Messenger RNA
Survival
Datasets
Therapeutics
Forests

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics
  • Molecular Biology
  • Molecular Medicine

Cite this

A robust prognostic signature for hormone-positive node-negative breast cancer. / Griffith, Obi L.; Pepin, François; Enache, Oana M.; Heiser, Laura; Collisson, Eric A.; Spellman, Paul; Gray, Joe.

In: Genome Medicine, Vol. 5, No. 10, 92, 11.10.2013.

Research output: Contribution to journalArticle

Griffith, Obi L. ; Pepin, François ; Enache, Oana M. ; Heiser, Laura ; Collisson, Eric A. ; Spellman, Paul ; Gray, Joe. / A robust prognostic signature for hormone-positive node-negative breast cancer. In: Genome Medicine. 2013 ; Vol. 5, No. 10.
@article{1b5dd69961e54cec9119e2d5f4017d76,
title = "A robust prognostic signature for hormone-positive node-negative breast cancer",
abstract = "Background: Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods: We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results: Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.Conclusions: RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.",
author = "Griffith, {Obi L.} and Fran{\cc}ois Pepin and Enache, {Oana M.} and Laura Heiser and Collisson, {Eric A.} and Paul Spellman and Joe Gray",
year = "2013",
month = "10",
day = "11",
doi = "10.1186/gm496",
language = "English (US)",
volume = "5",
journal = "Genome Medicine",
issn = "1756-994X",
publisher = "BioMed Central",
number = "10",

}

TY - JOUR

T1 - A robust prognostic signature for hormone-positive node-negative breast cancer

AU - Griffith, Obi L.

AU - Pepin, François

AU - Enache, Oana M.

AU - Heiser, Laura

AU - Collisson, Eric A.

AU - Spellman, Paul

AU - Gray, Joe

PY - 2013/10/11

Y1 - 2013/10/11

N2 - Background: Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods: We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results: Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.Conclusions: RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

AB - Background: Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods: We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results: Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.Conclusions: RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

UR - http://www.scopus.com/inward/record.url?scp=84885369269&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885369269&partnerID=8YFLogxK

U2 - 10.1186/gm496

DO - 10.1186/gm496

M3 - Article

AN - SCOPUS:84885369269

VL - 5

JO - Genome Medicine

JF - Genome Medicine

SN - 1756-994X

IS - 10

M1 - 92

ER -