Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

Joshua N. Burton; Andrew Adey; Rupali P. Patwardhan; Ruolan Qiu; Jacob O. Kitzman; Jay Shendure

doi:10.1038/nbt.2727

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

Joshua N. Burton, Andrew Adey, Rupali P. Patwardhan, Ruolan Qiu, Jacob O. Kitzman, Jay Shendure

Research output: Contribution to journal › Article › peer-review

886 Scopus citations

Abstract

Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving - for the human genome - 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

Original language	English (US)
Pages (from-to)	1119-1125
Number of pages	7
Journal	Nature biotechnology
Volume	31
Issue number	12
DOIs	https://doi.org/10.1038/nbt.2727
State	Published - Dec 2013
Externally published	Yes

ASJC Scopus subject areas

Biotechnology
Bioengineering
Applied Microbiology and Biotechnology
Molecular Medicine
Biomedical Engineering

Access to Document

10.1038/nbt.2727

Cite this

@article{98be45cedbe242e1a57cc25956488148,

title = "Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions",

abstract = "Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving - for the human genome - 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.",

author = "Burton, {Joshua N.} and Andrew Adey and Patwardhan, {Rupali P.} and Ruolan Qiu and Kitzman, {Jacob O.} and Jay Shendure",

note = "Funding Information: We thank F. Ay, E. Eichler, J. Felsenstein, P. Green, L. Hillier, M. van Min, W. Noble, R. Waterston and members of the Shendure lab for helpful discussions. Some of the sequencing data used in this research were derived from a HeLa cell line. Henrietta Lacks, and the HeLa cell line that was established from her tumor cells without her knowledge or consent in 1951, have made significant contributions to scientific progress and advances in human health. We are grateful to Henrietta Lacks, now deceased, and to her surviving family members for their contributions to biomedical research. Our work was supported by grant HG006283 from the National Human Genome Research Institute (NHGRI; to J.S.); a graduate research fellowship DGE0718124 from the National Science Foundation (to A.A. and J.O.K.); and grant T32HG000035 from the NHGRI (to J.N.B.).",

year = "2013",

month = dec,

doi = "10.1038/nbt.2727",

language = "English (US)",

volume = "31",

pages = "1119--1125",

journal = "Nature biotechnology",

issn = "1087-0156",

publisher = "Nature Publishing Group",

number = "12",

}

TY - JOUR

T1 - Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

AU - Burton, Joshua N.

AU - Adey, Andrew

AU - Patwardhan, Rupali P.

AU - Qiu, Ruolan

AU - Kitzman, Jacob O.

AU - Shendure, Jay

N1 - Funding Information: We thank F. Ay, E. Eichler, J. Felsenstein, P. Green, L. Hillier, M. van Min, W. Noble, R. Waterston and members of the Shendure lab for helpful discussions. Some of the sequencing data used in this research were derived from a HeLa cell line. Henrietta Lacks, and the HeLa cell line that was established from her tumor cells without her knowledge or consent in 1951, have made significant contributions to scientific progress and advances in human health. We are grateful to Henrietta Lacks, now deceased, and to her surviving family members for their contributions to biomedical research. Our work was supported by grant HG006283 from the National Human Genome Research Institute (NHGRI; to J.S.); a graduate research fellowship DGE0718124 from the National Science Foundation (to A.A. and J.O.K.); and grant T32HG000035 from the NHGRI (to J.N.B.).

PY - 2013/12

Y1 - 2013/12

N2 - Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving - for the human genome - 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

AB - Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving - for the human genome - 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.

UR - http://www.scopus.com/inward/record.url?scp=84890034912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890034912&partnerID=8YFLogxK

U2 - 10.1038/nbt.2727

DO - 10.1038/nbt.2727

M3 - Article

C2 - 24185095

AN - SCOPUS:84890034912

SN - 1087-0156

VL - 31

SP - 1119

EP - 1125

JO - Nature biotechnology

JF - Nature biotechnology

IS - 12

ER -

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this