Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection

Adam D. Ewing, Kathleen E. Houlahan, Yin Hu, Kyle Ellrott, Cristian Caloian, Takafumi N. Yamaguchi, J. Christopher Bare, Christine P'Ng, Daryl Waggott, Veronica Y. Sabelnykova, Michael R. Kellen, Thea C. Norman, David Haussler, Stephen H. Friend, Gustavo Stolovitzky, Adam Margolin, Joshua M. Stuart, Paul C. Boutros

Research output: Contribution to journalArticle

109 Citations (Scopus)

Abstract

The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.

Original languageEnglish (US)
Pages (from-to)623-630
Number of pages8
JournalNature Methods
Volume12
Issue number7
DOIs
StatePublished - Jun 30 2015

Fingerprint

Crowdsourcing
Benchmarking
Tumors
Nucleotides
Genes
Genome
Pipelines
Neoplasms
Mutation
Contamination
Inborn Genetic Diseases
Computer Simulation
Disease Progression
Survival

ASJC Scopus subject areas

  • Biotechnology
  • Molecular Biology
  • Biochemistry
  • Cell Biology

Cite this

Ewing, A. D., Houlahan, K. E., Hu, Y., Ellrott, K., Caloian, C., Yamaguchi, T. N., ... Boutros, P. C. (2015). Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nature Methods, 12(7), 623-630. https://doi.org/10.1038/nmeth.3407

Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. / Ewing, Adam D.; Houlahan, Kathleen E.; Hu, Yin; Ellrott, Kyle; Caloian, Cristian; Yamaguchi, Takafumi N.; Bare, J. Christopher; P'Ng, Christine; Waggott, Daryl; Sabelnykova, Veronica Y.; Kellen, Michael R.; Norman, Thea C.; Haussler, David; Friend, Stephen H.; Stolovitzky, Gustavo; Margolin, Adam; Stuart, Joshua M.; Boutros, Paul C.

In: Nature Methods, Vol. 12, No. 7, 30.06.2015, p. 623-630.

Research output: Contribution to journalArticle

Ewing, AD, Houlahan, KE, Hu, Y, Ellrott, K, Caloian, C, Yamaguchi, TN, Bare, JC, P'Ng, C, Waggott, D, Sabelnykova, VY, Kellen, MR, Norman, TC, Haussler, D, Friend, SH, Stolovitzky, G, Margolin, A, Stuart, JM & Boutros, PC 2015, 'Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection', Nature Methods, vol. 12, no. 7, pp. 623-630. https://doi.org/10.1038/nmeth.3407
Ewing, Adam D. ; Houlahan, Kathleen E. ; Hu, Yin ; Ellrott, Kyle ; Caloian, Cristian ; Yamaguchi, Takafumi N. ; Bare, J. Christopher ; P'Ng, Christine ; Waggott, Daryl ; Sabelnykova, Veronica Y. ; Kellen, Michael R. ; Norman, Thea C. ; Haussler, David ; Friend, Stephen H. ; Stolovitzky, Gustavo ; Margolin, Adam ; Stuart, Joshua M. ; Boutros, Paul C. / Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. In: Nature Methods. 2015 ; Vol. 12, No. 7. pp. 623-630.
@article{e33331b1ff0c4b788ab82040762b67ef,
title = "Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection",
abstract = "The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.",
author = "Ewing, {Adam D.} and Houlahan, {Kathleen E.} and Yin Hu and Kyle Ellrott and Cristian Caloian and Yamaguchi, {Takafumi N.} and Bare, {J. Christopher} and Christine P'Ng and Daryl Waggott and Sabelnykova, {Veronica Y.} and Kellen, {Michael R.} and Norman, {Thea C.} and David Haussler and Friend, {Stephen H.} and Gustavo Stolovitzky and Adam Margolin and Stuart, {Joshua M.} and Boutros, {Paul C.}",
year = "2015",
month = "6",
day = "30",
doi = "10.1038/nmeth.3407",
language = "English (US)",
volume = "12",
pages = "623--630",
journal = "PLoS Medicine",
issn = "1549-1277",
publisher = "Nature Publishing Group",
number = "7",

}

TY - JOUR

T1 - Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection

AU - Ewing, Adam D.

AU - Houlahan, Kathleen E.

AU - Hu, Yin

AU - Ellrott, Kyle

AU - Caloian, Cristian

AU - Yamaguchi, Takafumi N.

AU - Bare, J. Christopher

AU - P'Ng, Christine

AU - Waggott, Daryl

AU - Sabelnykova, Veronica Y.

AU - Kellen, Michael R.

AU - Norman, Thea C.

AU - Haussler, David

AU - Friend, Stephen H.

AU - Stolovitzky, Gustavo

AU - Margolin, Adam

AU - Stuart, Joshua M.

AU - Boutros, Paul C.

PY - 2015/6/30

Y1 - 2015/6/30

N2 - The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.

AB - The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.

UR - http://www.scopus.com/inward/record.url?scp=84937191337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937191337&partnerID=8YFLogxK

U2 - 10.1038/nmeth.3407

DO - 10.1038/nmeth.3407

M3 - Article

C2 - 25984700

AN - SCOPUS:84937191337

VL - 12

SP - 623

EP - 630

JO - PLoS Medicine

JF - PLoS Medicine

SN - 1549-1277

IS - 7

ER -