TY - JOUR
T1 - Rival penalized competitive learning (RPCL)
T2 - A topology-determining algorithm for analyzing gene expression data
AU - Nair, T. Murlidharan
AU - Zheng, Christina L.
AU - Fink, J. Lynn
AU - Stuart, Robert O.
AU - Gribskov, Michael
N1 - Funding Information:
This work is partly supported by the National Cancer Institute funding grant CA 88351 and NSF grants DBI-9975808 and DBI-0077378. ROS is supported by K08-DK02392. TMN thanks Drs. George Cunningham Virginia de Sa, N.V. Joshi and Sheila Podell for their comments and suggestions. This work is also part of the initial work for a proposed machine-learning project (LEGEND).
PY - 2003/12
Y1 - 2003/12
N2 - DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.
AB - DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.
KW - Clustering
KW - Gene expression
KW - RPCL
UR - http://www.scopus.com/inward/record.url?scp=0344081939&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0344081939&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2003.09.006
DO - 10.1016/j.compbiolchem.2003.09.006
M3 - Comment/debate
AN - SCOPUS:0344081939
SN - 1476-9271
VL - 27
SP - 565
EP - 574
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
IS - 6
ER -