Recovery guarantees for exemplar-based clustering

Abhinav Nellore, Rachel Ward

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

For a certain class of distributions, we prove that the linear programming relaxation of k-medoids clustering - a variant of k-means clustering where means are replaced by exemplars from within the dataset - distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.

Original languageEnglish (US)
Pages (from-to)165-180
Number of pages16
JournalInformation and Computation
Volume245
DOIs
StatePublished - Dec 1 2015
Externally publishedYes

Fingerprint

Recovery
Clustering
Ball
Linear programming
Linear Programming Relaxation
K-means Clustering
Thresholding
Pairwise

Keywords

  • Exact recovery
  • k-Medoids
  • Linear programming
  • Separated balls

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Recovery guarantees for exemplar-based clustering. / Nellore, Abhinav; Ward, Rachel.

In: Information and Computation, Vol. 245, 01.12.2015, p. 165-180.

Research output: Contribution to journalArticle

@article{0f26044034a5405788e2a4c283805991,
title = "Recovery guarantees for exemplar-based clustering",
abstract = "For a certain class of distributions, we prove that the linear programming relaxation of k-medoids clustering - a variant of k-means clustering where means are replaced by exemplars from within the dataset - distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.",
keywords = "Exact recovery, k-Medoids, Linear programming, Separated balls",
author = "Abhinav Nellore and Rachel Ward",
year = "2015",
month = "12",
day = "1",
doi = "10.1016/j.ic.2015.09.002",
language = "English (US)",
volume = "245",
pages = "165--180",
journal = "Information and Computation",
issn = "0890-5401",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Recovery guarantees for exemplar-based clustering

AU - Nellore, Abhinav

AU - Ward, Rachel

PY - 2015/12/1

Y1 - 2015/12/1

N2 - For a certain class of distributions, we prove that the linear programming relaxation of k-medoids clustering - a variant of k-means clustering where means are replaced by exemplars from within the dataset - distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.

AB - For a certain class of distributions, we prove that the linear programming relaxation of k-medoids clustering - a variant of k-means clustering where means are replaced by exemplars from within the dataset - distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.

KW - Exact recovery

KW - k-Medoids

KW - Linear programming

KW - Separated balls

UR - http://www.scopus.com/inward/record.url?scp=84948689972&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948689972&partnerID=8YFLogxK

U2 - 10.1016/j.ic.2015.09.002

DO - 10.1016/j.ic.2015.09.002

M3 - Article

AN - SCOPUS:84948689972

VL - 245

SP - 165

EP - 180

JO - Information and Computation

JF - Information and Computation

SN - 0890-5401

ER -