Semi-supervised Learning with penalized probabilistic clustering

Zhengdong Lu, Todd K. Leen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

63 Scopus citations


While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 17 - Proceedings of the 2004 Conference, NIPS 2004
PublisherNeural information processing systems foundation
ISBN (Print)0262195348, 9780262195348
StatePublished - Jan 1 2005
Event18th Annual Conference on Neural Information Processing Systems, NIPS 2004 - Vancouver, BC, Canada
Duration: Dec 13 2004Dec 16 2004

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258


Other18th Annual Conference on Neural Information Processing Systems, NIPS 2004
CityVancouver, BC

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Semi-supervised Learning with penalized probabilistic clustering'. Together they form a unique fingerprint.

Cite this