CG dinucleotide clustering is a species-specific property of the genome

Jacob L. Glass, Reid Thompson, Batbayar Khulan, Maria E. Figueroa, Emmanuel N. Olivier, Erin J. Oakley, Gary Van Zant, Eric E. Bouhassira, Ari Melnick, Aaron Golden, Melissa J. Fazzari, John M. Greally

Research output: Contribution to journalArticle

63 Citations (Scopus)

Abstract

Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

Original languageEnglish (US)
Pages (from-to)6798-6807
Number of pages10
JournalNucleic acids research
Volume35
Issue number20
DOIs
StatePublished - Nov 1 2007
Externally publishedYes

Fingerprint

Cytosine
Cluster Analysis
Guanine
Genome
CpG Islands
Takifugu
Deamination
Thymine
Transcription Initiation Site
Methyltransferases
Human Genome
cytidylyl-3'-5'-guanosine
Computational Biology
Germ Cells
Cell Cycle
Mutation
DNA

ASJC Scopus subject areas

  • Genetics

Cite this

Glass, J. L., Thompson, R., Khulan, B., Figueroa, M. E., Olivier, E. N., Oakley, E. J., ... Greally, J. M. (2007). CG dinucleotide clustering is a species-specific property of the genome. Nucleic acids research, 35(20), 6798-6807. https://doi.org/10.1093/nar/gkm489

CG dinucleotide clustering is a species-specific property of the genome. / Glass, Jacob L.; Thompson, Reid; Khulan, Batbayar; Figueroa, Maria E.; Olivier, Emmanuel N.; Oakley, Erin J.; Van Zant, Gary; Bouhassira, Eric E.; Melnick, Ari; Golden, Aaron; Fazzari, Melissa J.; Greally, John M.

In: Nucleic acids research, Vol. 35, No. 20, 01.11.2007, p. 6798-6807.

Research output: Contribution to journalArticle

Glass, JL, Thompson, R, Khulan, B, Figueroa, ME, Olivier, EN, Oakley, EJ, Van Zant, G, Bouhassira, EE, Melnick, A, Golden, A, Fazzari, MJ & Greally, JM 2007, 'CG dinucleotide clustering is a species-specific property of the genome', Nucleic acids research, vol. 35, no. 20, pp. 6798-6807. https://doi.org/10.1093/nar/gkm489
Glass JL, Thompson R, Khulan B, Figueroa ME, Olivier EN, Oakley EJ et al. CG dinucleotide clustering is a species-specific property of the genome. Nucleic acids research. 2007 Nov 1;35(20):6798-6807. https://doi.org/10.1093/nar/gkm489
Glass, Jacob L. ; Thompson, Reid ; Khulan, Batbayar ; Figueroa, Maria E. ; Olivier, Emmanuel N. ; Oakley, Erin J. ; Van Zant, Gary ; Bouhassira, Eric E. ; Melnick, Ari ; Golden, Aaron ; Fazzari, Melissa J. ; Greally, John M. / CG dinucleotide clustering is a species-specific property of the genome. In: Nucleic acids research. 2007 ; Vol. 35, No. 20. pp. 6798-6807.
@article{cbe70265fa404b619468c247375b7a15,
title = "CG dinucleotide clustering is a species-specific property of the genome",
abstract = "Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.",
author = "Glass, {Jacob L.} and Reid Thompson and Batbayar Khulan and Figueroa, {Maria E.} and Olivier, {Emmanuel N.} and Oakley, {Erin J.} and {Van Zant}, Gary and Bouhassira, {Eric E.} and Ari Melnick and Aaron Golden and Fazzari, {Melissa J.} and Greally, {John M.}",
year = "2007",
month = "11",
day = "1",
doi = "10.1093/nar/gkm489",
language = "English (US)",
volume = "35",
pages = "6798--6807",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "20",

}

TY - JOUR

T1 - CG dinucleotide clustering is a species-specific property of the genome

AU - Glass, Jacob L.

AU - Thompson, Reid

AU - Khulan, Batbayar

AU - Figueroa, Maria E.

AU - Olivier, Emmanuel N.

AU - Oakley, Erin J.

AU - Van Zant, Gary

AU - Bouhassira, Eric E.

AU - Melnick, Ari

AU - Golden, Aaron

AU - Fazzari, Melissa J.

AU - Greally, John M.

PY - 2007/11/1

Y1 - 2007/11/1

N2 - Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

AB - Cytosines at cytosine-guanine (CG) dinucleotides are the near-exclusive target of DNA methyltransferases in mammalian genomes. Spontaneous deamination of methylcytosine to thymine makes methylated cytosines unusually susceptible to mutation and consequent depletion. The loci where CG dinucleotides remain relatively enriched, presumably due to their unmethylated status during the germ cell cycle, have been referred to as CpG islands. Currently, CpG islands are solely defined by base compositional criteria, allowing annotation of any sequenced genome. Using a novel bioinformatic approach, we show that CG clusters can be identified as an inherent property of genomic sequence without imposing a base compositional a priori assumption. We also show that the CG clusters co-localize in the human genome with hypomethylated loci and annotated transcription start sites to a greater extent than annotations produced by prior CpG island definitions. Moreover, this new approach allows CG clusters to be identified in a species-specific manner, revealing a degree of orthologous conservation that is not revealed by current base compositional approaches. Finally, our approach is able to identify methylating genomes (such as Takifugu rubripes) that lack CG clustering entirely, in which it is inappropriate to annotate CpG islands or CG clusters.

UR - http://www.scopus.com/inward/record.url?scp=36749017978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36749017978&partnerID=8YFLogxK

U2 - 10.1093/nar/gkm489

DO - 10.1093/nar/gkm489

M3 - Article

VL - 35

SP - 6798

EP - 6807

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 20

ER -