Clustering with varying risks of false assignments in discrete latent variable model

Donghwan Lee, Dongseok Choi, Youngjo Lee

Research output: Contribution to journalArticlepeer-review

Abstract

In clustering problems, to model the intrinsic structure of unlabeled data, the latent variable models are frequently used. These model-based clustering methods often provide a clustering rule minimizing the total false assignment error. However, in many clustering applications, it is desirable to treat false assignment errors for a certain cluster differently. In this paper, we introduce the false assignment rate for clustering and estimate it by using the extended likelihood approach. We propose VRclust, a novel clustering rule that controls various errors differently across clusters. Real data examples illustrate the usage of estimation of false assignment rate and a simulation study shows that error controls are consistent as the sample size increases.

Original languageEnglish (US)
Pages (from-to)2932-2944
Number of pages13
JournalStatistical methods in medical research
Volume29
Issue number10
DOIs
StatePublished - Oct 1 2020

Keywords

  • Clustering
  • extended likelihood
  • false assignment rate

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Fingerprint Dive into the research topics of 'Clustering with varying risks of false assignments in discrete latent variable model'. Together they form a unique fingerprint.

Cite this