Penalized probabilistic clustering

Zhengdong Lu; Todd K. Leen

doi:10.1162/neco.2007.19.6.1528

Penalized probabilistic clustering

Zhengdong Lu, Todd K. Leen

Biomedical Engineering

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.

Original language	English (US)
Pages (from-to)	1528-1567
Number of pages	40
Journal	Neural Computation
Volume	19
Issue number	6
DOIs	https://doi.org/10.1162/neco.2007.19.6.1528
State	Published - Jun 2007

ASJC Scopus subject areas

Arts and Humanities (miscellaneous)
Cognitive Neuroscience

Access to Document

10.1162/neco.2007.19.6.1528

Cite this

@article{f21259268a224e56932405cc58aa4eff,

title = "Penalized probabilistic clustering",

abstract = "While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.",

author = "Zhengdong Lu and Leen, {Todd K.}",

note = "Funding Information: We thank Ashok Srivastava of NASA Ames Research Center for providing satellite image data and the reviewers for comments leading to a stronger letter. This work was funded by NASA Collaborative Agreement NCC 2-1264 and NSF grant OCI-0121475. Funding Information: We thank Ashok Srivastava of NASAAmes Research Center for providing satellite image data and the reviewers for comments leading to a stronger letter. This work was funded by NASA Collaborative Agreement NCC 2-1264 and NSF grant OCI-0121475.",

year = "2007",

month = jun,

doi = "10.1162/neco.2007.19.6.1528",

language = "English (US)",

volume = "19",

pages = "1528--1567",

journal = "Neural Computation",

issn = "0899-7667",

publisher = "MIT Press Journals",

number = "6",

}

TY - JOUR

T1 - Penalized probabilistic clustering

AU - Lu, Zhengdong

AU - Leen, Todd K.

N1 - Funding Information: We thank Ashok Srivastava of NASA Ames Research Center for providing satellite image data and the reviewers for comments leading to a stronger letter. This work was funded by NASA Collaborative Agreement NCC 2-1264 and NSF grant OCI-0121475. Funding Information: We thank Ashok Srivastava of NASAAmes Research Center for providing satellite image data and the reviewers for comments leading to a stronger letter. This work was funded by NASA Collaborative Agreement NCC 2-1264 and NSF grant OCI-0121475.

PY - 2007/6

Y1 - 2007/6

N2 - While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.

AB - While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.

UR - http://www.scopus.com/inward/record.url?scp=34249725008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34249725008&partnerID=8YFLogxK

U2 - 10.1162/neco.2007.19.6.1528

DO - 10.1162/neco.2007.19.6.1528

M3 - Article

C2 - 17444759

AN - SCOPUS:34249725008

SN - 0899-7667

VL - 19

SP - 1528

EP - 1567

JO - Neural Computation

JF - Neural Computation

IS - 6

ER -

Penalized probabilistic clustering

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this