Penalized probabilistic clustering

Zhengdong Lu, Todd K. Leen

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.

Original languageEnglish (US)
Pages (from-to)1528-1567
Number of pages40
JournalNeural Computation
Volume19
Issue number6
DOIs
StatePublished - Jun 2007

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Cognitive Neuroscience

Fingerprint Dive into the research topics of 'Penalized probabilistic clustering'. Together they form a unique fingerprint.

Cite this