A new rhesus macaque assembly and annotation for next-generation sequencing analyses

Aleksey V. Zimin, Adam S. Cornish, Mnirnal D. Maudhoo, Robert M. Gibbs, Xiongfei Zhang, Sanjit Pandey, Daniel T. Meehan, Kristin Wipfler, Steven E. Bosinger, Zachary P. Johnson, Gregory K. Tharp, Guillaume Marçais, Michael Roberts, Betsy Ferguson, Howard S. Fox, Todd Treangen, Steven L. Salzberg, James A. Yorke, Robert B. Norgren

Research output: Contribution to journalArticle

70 Citations (Scopus)

Abstract

BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.

RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.

CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.

REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

Original languageEnglish (US)
Pages (from-to)20
Number of pages1
JournalBiology Direct
Volume9
Issue number1
DOIs
StatePublished - 2014
Externally publishedYes

Fingerprint

Macaca mulatta
Sequencing
Annotation
genome
Genes
Genome
Macaca
genome assembly
Scaffold
RNA
nucleotide sequences
Scaffolds
Alignment
biomedical research
messenger RNA
torrent
Gene
gene
Resources
resource

ASJC Scopus subject areas

  • Immunology
  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Zimin, A. V., Cornish, A. S., Maudhoo, M. D., Gibbs, R. M., Zhang, X., Pandey, S., ... Norgren, R. B. (2014). A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biology Direct, 9(1), 20. https://doi.org/10.1186/1745-6150-9-20

A new rhesus macaque assembly and annotation for next-generation sequencing analyses. / Zimin, Aleksey V.; Cornish, Adam S.; Maudhoo, Mnirnal D.; Gibbs, Robert M.; Zhang, Xiongfei; Pandey, Sanjit; Meehan, Daniel T.; Wipfler, Kristin; Bosinger, Steven E.; Johnson, Zachary P.; Tharp, Gregory K.; Marçais, Guillaume; Roberts, Michael; Ferguson, Betsy; Fox, Howard S.; Treangen, Todd; Salzberg, Steven L.; Yorke, James A.; Norgren, Robert B.

In: Biology Direct, Vol. 9, No. 1, 2014, p. 20.

Research output: Contribution to journalArticle

Zimin, AV, Cornish, AS, Maudhoo, MD, Gibbs, RM, Zhang, X, Pandey, S, Meehan, DT, Wipfler, K, Bosinger, SE, Johnson, ZP, Tharp, GK, Marçais, G, Roberts, M, Ferguson, B, Fox, HS, Treangen, T, Salzberg, SL, Yorke, JA & Norgren, RB 2014, 'A new rhesus macaque assembly and annotation for next-generation sequencing analyses', Biology Direct, vol. 9, no. 1, pp. 20. https://doi.org/10.1186/1745-6150-9-20
Zimin, Aleksey V. ; Cornish, Adam S. ; Maudhoo, Mnirnal D. ; Gibbs, Robert M. ; Zhang, Xiongfei ; Pandey, Sanjit ; Meehan, Daniel T. ; Wipfler, Kristin ; Bosinger, Steven E. ; Johnson, Zachary P. ; Tharp, Gregory K. ; Marçais, Guillaume ; Roberts, Michael ; Ferguson, Betsy ; Fox, Howard S. ; Treangen, Todd ; Salzberg, Steven L. ; Yorke, James A. ; Norgren, Robert B. / A new rhesus macaque assembly and annotation for next-generation sequencing analyses. In: Biology Direct. 2014 ; Vol. 9, No. 1. pp. 20.
@article{158f642d413345a7ac4085de37aea8b5,
title = "A new rhesus macaque assembly and annotation for next-generation sequencing analyses",
abstract = "BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.",
author = "Zimin, {Aleksey V.} and Cornish, {Adam S.} and Maudhoo, {Mnirnal D.} and Gibbs, {Robert M.} and Xiongfei Zhang and Sanjit Pandey and Meehan, {Daniel T.} and Kristin Wipfler and Bosinger, {Steven E.} and Johnson, {Zachary P.} and Tharp, {Gregory K.} and Guillaume Mar{\cc}ais and Michael Roberts and Betsy Ferguson and Fox, {Howard S.} and Todd Treangen and Salzberg, {Steven L.} and Yorke, {James A.} and Norgren, {Robert B.}",
year = "2014",
doi = "10.1186/1745-6150-9-20",
language = "English (US)",
volume = "9",
pages = "20",
journal = "Biology Direct",
issn = "1745-6150",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A new rhesus macaque assembly and annotation for next-generation sequencing analyses

AU - Zimin, Aleksey V.

AU - Cornish, Adam S.

AU - Maudhoo, Mnirnal D.

AU - Gibbs, Robert M.

AU - Zhang, Xiongfei

AU - Pandey, Sanjit

AU - Meehan, Daniel T.

AU - Wipfler, Kristin

AU - Bosinger, Steven E.

AU - Johnson, Zachary P.

AU - Tharp, Gregory K.

AU - Marçais, Guillaume

AU - Roberts, Michael

AU - Ferguson, Betsy

AU - Fox, Howard S.

AU - Treangen, Todd

AU - Salzberg, Steven L.

AU - Yorke, James A.

AU - Norgren, Robert B.

PY - 2014

Y1 - 2014

N2 - BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

AB - BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

UR - http://www.scopus.com/inward/record.url?scp=85003348909&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85003348909&partnerID=8YFLogxK

U2 - 10.1186/1745-6150-9-20

DO - 10.1186/1745-6150-9-20

M3 - Article

VL - 9

SP - 20

JO - Biology Direct

JF - Biology Direct

SN - 1745-6150

IS - 1

ER -