Bias from removing read duplication in ultra-deep sequencing experiments

Wanding Zhou, Tenghui Chen, Hao Zhao, Agda Karina Eterovic, Funda Meric-Bernstam, Gordon B. Mills, Ken Chen

    Research output: Contribution to journalArticle

    19 Scopus citations

    Abstract

    Motivation: Identifying subclonal mutations and their implications requires accurate estimation of mutant allele fractions from possibly duplicated sequencing reads. Removing duplicate reads assumes that polymerase chain reaction amplification from library constructions is the primary source. The alternative-sampling coincidence from DNA fragmentation-has not been systematically investigated. Results: With sufficiently high-sequencing depth, sampling-induced read duplication is non-negligible, and removing duplicate reads can overcorrect read counts, causing systemic biases in variant allele fraction and copy number variation estimations. Minimal overcorrection occurs when duplicate reads are identified accounting for their mate reads, inserts are of a variety of lengths and samples are sequenced in separate batches. We investigate sampling-induced read duplication in deep sequencing data with 500 to 2000 duplicates-removed sequence coverage. We provide a quantitative solution to overcorrection and guidance for effective designs of deep sequencing platforms that facilitate accurate estimation of variant allele fraction and copy number variation.

    Original languageEnglish (US)
    Pages (from-to)1073-1080
    Number of pages8
    JournalBioinformatics
    Volume30
    Issue number8
    DOIs
    StatePublished - Jan 1 2014

    ASJC Scopus subject areas

    • Statistics and Probability
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Computational Theory and Mathematics
    • Computational Mathematics

    Fingerprint Dive into the research topics of 'Bias from removing read duplication in ultra-deep sequencing experiments'. Together they form a unique fingerprint.

  • Cite this

    Zhou, W., Chen, T., Zhao, H., Eterovic, A. K., Meric-Bernstam, F., Mills, G. B., & Chen, K. (2014). Bias from removing read duplication in ultra-deep sequencing experiments. Bioinformatics, 30(8), 1073-1080. https://doi.org/10.1093/bioinformatics/btt771