Identifying transcription factor binding sites through Markov chain optimization

Kyle Ellrott; Chuhu Yang; Frances M. Sladek; Tao Jiang

doi:10.1093/bioinformatics/18.suppl_2.S100

Identifying transcription factor binding sites through Markov chain optimization

Kyle Ellrott, Chuhu Yang, Frances M. Sladek, Tao Jiang

Research output: Contribution to journal › Article › peer-review

75 Scopus citations

Abstract

Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 α (HNF4α). A list of 71 known HNF4α binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4α. Our results show that the program was very successful in identifying 77 new HNF4α binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.

Original language	English (US)
Pages (from-to)	S100-S109
Journal	Bioinformatics
Volume	18
Issue number	SUPPL. 2
DOIs	https://doi.org/10.1093/bioinformatics/18.suppl_2.S100
State	Published - Oct 1 2002
Externally published	Yes

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/18.suppl_2.S100

Cite this

@article{85ca7af5653b48c0ab71b7ac10153978,

title = "Identifying transcription factor binding sites through Markov chain optimization",

abstract = "Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 α (HNF4α). A list of 71 known HNF4α binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4α. Our results show that the program was very successful in identifying 77 new HNF4α binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.",

author = "Kyle Ellrott and Chuhu Yang and Sladek, {Frances M.} and Tao Jiang",

year = "2002",

month = oct,

day = "1",

doi = "10.1093/bioinformatics/18.suppl_2.S100",

language = "English (US)",

volume = "18",

pages = "S100--S109",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "SUPPL. 2",

}

TY - JOUR

T1 - Identifying transcription factor binding sites through Markov chain optimization

AU - Ellrott, Kyle

AU - Yang, Chuhu

AU - Sladek, Frances M.

AU - Jiang, Tao

PY - 2002/10/1

Y1 - 2002/10/1

N2 - Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 α (HNF4α). A list of 71 known HNF4α binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4α. Our results show that the program was very successful in identifying 77 new HNF4α binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.

AB - Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 α (HNF4α). A list of 71 known HNF4α binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4α. Our results show that the program was very successful in identifying 77 new HNF4α binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.

UR - http://www.scopus.com/inward/record.url?scp=0242424698&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242424698&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/18.suppl_2.S100

DO - 10.1093/bioinformatics/18.suppl_2.S100

M3 - Article

C2 - 12385991

AN - SCOPUS:0242424698

SN - 1367-4803

VL - 18

SP - S100-S109

JO - Bioinformatics

JF - Bioinformatics

IS - SUPPL. 2

ER -

Identifying transcription factor binding sites through Markov chain optimization

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this