Gene set analyses for interpreting microarray experiments on prokaryotic organisms

Nathan L. Tintle; Aaron A. Best; Matthew DeJongh; Dirk Van Bruggen; Fred Heffron; Steffen Porwollik; Ronald C. Taylor

doi:10.1186/1471-2105-9-469

Gene set analyses for interpreting microarray experiments on prokaryotic organisms

Nathan L. Tintle, Aaron A. Best, Matthew DeJongh, Dirk Van Bruggen, Fred Heffron, Steffen Porwollik, Ronald C. Taylor

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Background: Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. Results: We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. Conclusion: MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.

Original language	English (US)
Article number	469
Journal	BMC bioinformatics
Volume	9
DOIs	https://doi.org/10.1186/1471-2105-9-469
State	Published - Nov 5 2008

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-9-469

Cite this

@article{c91519e019c5488aa882a87051a1fc45,

title = "Gene set analyses for interpreting microarray experiments on prokaryotic organisms",

abstract = "Background: Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. Results: We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. Conclusion: MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.",

author = "Tintle, {Nathan L.} and Best, {Aaron A.} and Matthew DeJongh and {Van Bruggen}, Dirk and Fred Heffron and Steffen Porwollik and Taylor, {Ronald C.}",

note = "Funding Information: We acknowledge the helpful feedback from two anonymous reviewers. We thank Paul Van Allsburg for his assistance in running simulations on Hope College's Computational Science and Modelling parallel computing cluster through funding from the Howard Hughes Medical Institute. This project was funded in part by the National Human Genome Research Institute, grant number R15HG004543 to Tintle. The content is solely the responsibility of the authors and does not necessarily represent the official view of the National Human Genome Research Institute or the National Institutes of Health. Further, this research was supported in part by a grant to Hope College from the Howard Hughes Medical Institute through the Undergraduate Science Education Program. Dirk Van Bruggen received partial support from a computational science and modelling scholar award from the Hope College Howard Hughes Medical Institute program, a fellowship from the Michigan Space Grant Consortium and support from the Tanis Fund for Statistics Research. Salmonella microarray experiments were run using funding from grant NIH-R01AI022933 to Fred Heffron. Data on E Coli was generously provided by Tyrrell Conway and Joseph Grissom. We also acknowledge the support of Ross Overbeek and Rick Stevens for providing access to the SEED.",

year = "2008",

month = nov,

day = "5",

doi = "10.1186/1471-2105-9-469",

language = "English (US)",

volume = "9",

journal = "BMC bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Gene set analyses for interpreting microarray experiments on prokaryotic organisms

AU - Tintle, Nathan L.

AU - Best, Aaron A.

AU - DeJongh, Matthew

AU - Van Bruggen, Dirk

AU - Heffron, Fred

AU - Porwollik, Steffen

AU - Taylor, Ronald C.

N1 - Funding Information: We acknowledge the helpful feedback from two anonymous reviewers. We thank Paul Van Allsburg for his assistance in running simulations on Hope College's Computational Science and Modelling parallel computing cluster through funding from the Howard Hughes Medical Institute. This project was funded in part by the National Human Genome Research Institute, grant number R15HG004543 to Tintle. The content is solely the responsibility of the authors and does not necessarily represent the official view of the National Human Genome Research Institute or the National Institutes of Health. Further, this research was supported in part by a grant to Hope College from the Howard Hughes Medical Institute through the Undergraduate Science Education Program. Dirk Van Bruggen received partial support from a computational science and modelling scholar award from the Hope College Howard Hughes Medical Institute program, a fellowship from the Michigan Space Grant Consortium and support from the Tanis Fund for Statistics Research. Salmonella microarray experiments were run using funding from grant NIH-R01AI022933 to Fred Heffron. Data on E Coli was generously provided by Tyrrell Conway and Joseph Grissom. We also acknowledge the support of Ross Overbeek and Rick Stevens for providing access to the SEED.

PY - 2008/11/5

Y1 - 2008/11/5

N2 - Background: Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. Results: We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. Conclusion: MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.

AB - Background: Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes. Results: We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR. Conclusion: MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.

UR - http://www.scopus.com/inward/record.url?scp=57049101812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57049101812&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-469

DO - 10.1186/1471-2105-9-469

M3 - Article

C2 - 18986519

AN - SCOPUS:57049101812

SN - 1471-2105

VL - 9

JO - BMC bioinformatics

JF - BMC bioinformatics

M1 - 469

ER -

Gene set analyses for interpreting microarray experiments on prokaryotic organisms

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this