Theory and limitations of genetic network inference from microarray data

Adam Margolin, Andrea Califano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

68 Citations (Scopus)

Abstract

Since the advent of gene expression microarray technology more than 10 years ago, many computational approaches have been developed aimed at using statistical associations between mRNA abundance profiles to predict transcriptional regulatory interactions. The ultimate goal is to develop causal network models describing the transcriptional influences that genes exert on each other (via their protein products), which can be used to predict network disruptions (e.g., mutations) leading to a disease phenotype, as well as the appropriate therapeutic intervention. However, microarray data measure only a small component of the interacting variables in a genetic regulatory network, as cells are known to regulate gene expression via many diverse mechanisms. Although many researchers have acknowledged the questionable interpretation of statistical dependencies between mRNA profiles, very little work has been done on theoretically characterizing the nature of inferred dependencies using models that account for unobserved interacting variables. In this work, we review the theory behind reverse engineering algorithms derived from three separate disciplines - system control theory, graphical models, and information theory - and highlight several mathematical relationships between the various methods. We then apply recent theoretical work on constructing graphical models with latent variables to the context of reverse engineering genetic networks. We demonstrate that even the addition of simple latent variables induces statistical dependencies between non-directly interacting (e.g., co-regulated) genes that cannot be eliminated by conditioning on any observed variables.

Original languageEnglish (US)
Title of host publicationAnnals of the New York Academy of Sciences
Pages51-72
Number of pages22
Volume1115
DOIs
StatePublished - Dec 2007
Externally publishedYes

Publication series

NameAnnals of the New York Academy of Sciences
Volume1115
ISSN (Print)00778923
ISSN (Electronic)17496632

Fingerprint

Microarrays
Information Theory
Gene Expression
Reverse Genetics
Messenger RNA
Reverse engineering
Gene expression
Genes
Research Personnel
Technology
Phenotype
Mutation
Information theory
Control theory
Proteins
Therapeutics

Keywords

  • Gene expression
  • Latent variables
  • Reverse engineering

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Margolin, A., & Califano, A. (2007). Theory and limitations of genetic network inference from microarray data. In Annals of the New York Academy of Sciences (Vol. 1115, pp. 51-72). (Annals of the New York Academy of Sciences; Vol. 1115). https://doi.org/10.1196/annals.1407.019

Theory and limitations of genetic network inference from microarray data. / Margolin, Adam; Califano, Andrea.

Annals of the New York Academy of Sciences. Vol. 1115 2007. p. 51-72 (Annals of the New York Academy of Sciences; Vol. 1115).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Margolin, A & Califano, A 2007, Theory and limitations of genetic network inference from microarray data. in Annals of the New York Academy of Sciences. vol. 1115, Annals of the New York Academy of Sciences, vol. 1115, pp. 51-72. https://doi.org/10.1196/annals.1407.019
Margolin A, Califano A. Theory and limitations of genetic network inference from microarray data. In Annals of the New York Academy of Sciences. Vol. 1115. 2007. p. 51-72. (Annals of the New York Academy of Sciences). https://doi.org/10.1196/annals.1407.019
Margolin, Adam ; Califano, Andrea. / Theory and limitations of genetic network inference from microarray data. Annals of the New York Academy of Sciences. Vol. 1115 2007. pp. 51-72 (Annals of the New York Academy of Sciences).
@inproceedings{3d4d07db0d384a8d8b26912c3ea27372,
title = "Theory and limitations of genetic network inference from microarray data",
abstract = "Since the advent of gene expression microarray technology more than 10 years ago, many computational approaches have been developed aimed at using statistical associations between mRNA abundance profiles to predict transcriptional regulatory interactions. The ultimate goal is to develop causal network models describing the transcriptional influences that genes exert on each other (via their protein products), which can be used to predict network disruptions (e.g., mutations) leading to a disease phenotype, as well as the appropriate therapeutic intervention. However, microarray data measure only a small component of the interacting variables in a genetic regulatory network, as cells are known to regulate gene expression via many diverse mechanisms. Although many researchers have acknowledged the questionable interpretation of statistical dependencies between mRNA profiles, very little work has been done on theoretically characterizing the nature of inferred dependencies using models that account for unobserved interacting variables. In this work, we review the theory behind reverse engineering algorithms derived from three separate disciplines - system control theory, graphical models, and information theory - and highlight several mathematical relationships between the various methods. We then apply recent theoretical work on constructing graphical models with latent variables to the context of reverse engineering genetic networks. We demonstrate that even the addition of simple latent variables induces statistical dependencies between non-directly interacting (e.g., co-regulated) genes that cannot be eliminated by conditioning on any observed variables.",
keywords = "Gene expression, Latent variables, Reverse engineering",
author = "Adam Margolin and Andrea Califano",
year = "2007",
month = "12",
doi = "10.1196/annals.1407.019",
language = "English (US)",
isbn = "9781573316897",
volume = "1115",
series = "Annals of the New York Academy of Sciences",
pages = "51--72",
booktitle = "Annals of the New York Academy of Sciences",

}

TY - GEN

T1 - Theory and limitations of genetic network inference from microarray data

AU - Margolin, Adam

AU - Califano, Andrea

PY - 2007/12

Y1 - 2007/12

N2 - Since the advent of gene expression microarray technology more than 10 years ago, many computational approaches have been developed aimed at using statistical associations between mRNA abundance profiles to predict transcriptional regulatory interactions. The ultimate goal is to develop causal network models describing the transcriptional influences that genes exert on each other (via their protein products), which can be used to predict network disruptions (e.g., mutations) leading to a disease phenotype, as well as the appropriate therapeutic intervention. However, microarray data measure only a small component of the interacting variables in a genetic regulatory network, as cells are known to regulate gene expression via many diverse mechanisms. Although many researchers have acknowledged the questionable interpretation of statistical dependencies between mRNA profiles, very little work has been done on theoretically characterizing the nature of inferred dependencies using models that account for unobserved interacting variables. In this work, we review the theory behind reverse engineering algorithms derived from three separate disciplines - system control theory, graphical models, and information theory - and highlight several mathematical relationships between the various methods. We then apply recent theoretical work on constructing graphical models with latent variables to the context of reverse engineering genetic networks. We demonstrate that even the addition of simple latent variables induces statistical dependencies between non-directly interacting (e.g., co-regulated) genes that cannot be eliminated by conditioning on any observed variables.

AB - Since the advent of gene expression microarray technology more than 10 years ago, many computational approaches have been developed aimed at using statistical associations between mRNA abundance profiles to predict transcriptional regulatory interactions. The ultimate goal is to develop causal network models describing the transcriptional influences that genes exert on each other (via their protein products), which can be used to predict network disruptions (e.g., mutations) leading to a disease phenotype, as well as the appropriate therapeutic intervention. However, microarray data measure only a small component of the interacting variables in a genetic regulatory network, as cells are known to regulate gene expression via many diverse mechanisms. Although many researchers have acknowledged the questionable interpretation of statistical dependencies between mRNA profiles, very little work has been done on theoretically characterizing the nature of inferred dependencies using models that account for unobserved interacting variables. In this work, we review the theory behind reverse engineering algorithms derived from three separate disciplines - system control theory, graphical models, and information theory - and highlight several mathematical relationships between the various methods. We then apply recent theoretical work on constructing graphical models with latent variables to the context of reverse engineering genetic networks. We demonstrate that even the addition of simple latent variables induces statistical dependencies between non-directly interacting (e.g., co-regulated) genes that cannot be eliminated by conditioning on any observed variables.

KW - Gene expression

KW - Latent variables

KW - Reverse engineering

UR - http://www.scopus.com/inward/record.url?scp=36248989626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36248989626&partnerID=8YFLogxK

U2 - 10.1196/annals.1407.019

DO - 10.1196/annals.1407.019

M3 - Conference contribution

SN - 9781573316897

VL - 1115

T3 - Annals of the New York Academy of Sciences

SP - 51

EP - 72

BT - Annals of the New York Academy of Sciences

ER -