Semi-supervised Learning with penalized probabilistic clustering

Zhengdong Lu, Todd K. Leen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

55 Citations (Scopus)

Abstract

While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems
PublisherNeural information processing systems foundation
ISBN (Print)0262195348, 9780262195348
StatePublished - 2005
Event18th Annual Conference on Neural Information Processing Systems, NIPS 2004 - Vancouver, BC, Canada
Duration: Dec 13 2004Dec 16 2004

Other

Other18th Annual Conference on Neural Information Processing Systems, NIPS 2004
CountryCanada
CityVancouver, BC
Period12/13/0412/16/04

Fingerprint

Supervised learning
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Lu, Z., & Leen, T. K. (2005). Semi-supervised Learning with penalized probabilistic clustering. In Advances in Neural Information Processing Systems Neural information processing systems foundation.

Semi-supervised Learning with penalized probabilistic clustering. / Lu, Zhengdong; Leen, Todd K.

Advances in Neural Information Processing Systems. Neural information processing systems foundation, 2005.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lu, Z & Leen, TK 2005, Semi-supervised Learning with penalized probabilistic clustering. in Advances in Neural Information Processing Systems. Neural information processing systems foundation, 18th Annual Conference on Neural Information Processing Systems, NIPS 2004, Vancouver, BC, Canada, 12/13/04.
Lu Z, Leen TK. Semi-supervised Learning with penalized probabilistic clustering. In Advances in Neural Information Processing Systems. Neural information processing systems foundation. 2005
Lu, Zhengdong ; Leen, Todd K. / Semi-supervised Learning with penalized probabilistic clustering. Advances in Neural Information Processing Systems. Neural information processing systems foundation, 2005.
@inproceedings{d55d375e9ac4494dba52e21d885c045d,
title = "Semi-supervised Learning with penalized probabilistic clustering",
abstract = "While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.",
author = "Zhengdong Lu and Leen, {Todd K.}",
year = "2005",
language = "English (US)",
isbn = "0262195348",
booktitle = "Advances in Neural Information Processing Systems",
publisher = "Neural information processing systems foundation",

}

TY - GEN

T1 - Semi-supervised Learning with penalized probabilistic clustering

AU - Lu, Zhengdong

AU - Leen, Todd K.

PY - 2005

Y1 - 2005

N2 - While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

AB - While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

UR - http://www.scopus.com/inward/record.url?scp=84898984833&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898984833&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0262195348

SN - 9780262195348

BT - Advances in Neural Information Processing Systems

PB - Neural information processing systems foundation

ER -