Accelerated experimental design for pairwise comparisons

Yuan Guo, Jennifer Dy, Deniz Erdogmus, Jayashree Kalpathy-Cramer, Susan Ostmo, John Campbell, Michael Chiang, Stratis Ioannidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining, SDM 2019
PublisherSociety for Industrial and Applied Mathematics Publications
Pages432-440
Number of pages9
ISBN (Electronic)9781611975673
StatePublished - Jan 1 2019
Event19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada
Duration: May 2 2019May 4 2019

Publication series

NameSIAM International Conference on Data Mining, SDM 2019

Conference

Conference19th SIAM International Conference on Data Mining, SDM 2019
CountryCanada
CityCalgary
Period5/2/195/4/19

Fingerprint

Design of experiments
Labels
Geometry

ASJC Scopus subject areas

  • Software

Cite this

Guo, Y., Dy, J., Erdogmus, D., Kalpathy-Cramer, J., Ostmo, S., Campbell, J., ... Ioannidis, S. (2019). Accelerated experimental design for pairwise comparisons. In SIAM International Conference on Data Mining, SDM 2019 (pp. 432-440). (SIAM International Conference on Data Mining, SDM 2019). Society for Industrial and Applied Mathematics Publications.

Accelerated experimental design for pairwise comparisons. / Guo, Yuan; Dy, Jennifer; Erdogmus, Deniz; Kalpathy-Cramer, Jayashree; Ostmo, Susan; Campbell, John; Chiang, Michael; Ioannidis, Stratis.

SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications, 2019. p. 432-440 (SIAM International Conference on Data Mining, SDM 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guo, Y, Dy, J, Erdogmus, D, Kalpathy-Cramer, J, Ostmo, S, Campbell, J, Chiang, M & Ioannidis, S 2019, Accelerated experimental design for pairwise comparisons. in SIAM International Conference on Data Mining, SDM 2019. SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics Publications, pp. 432-440, 19th SIAM International Conference on Data Mining, SDM 2019, Calgary, Canada, 5/2/19.
Guo Y, Dy J, Erdogmus D, Kalpathy-Cramer J, Ostmo S, Campbell J et al. Accelerated experimental design for pairwise comparisons. In SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications. 2019. p. 432-440. (SIAM International Conference on Data Mining, SDM 2019).
Guo, Yuan ; Dy, Jennifer ; Erdogmus, Deniz ; Kalpathy-Cramer, Jayashree ; Ostmo, Susan ; Campbell, John ; Chiang, Michael ; Ioannidis, Stratis. / Accelerated experimental design for pairwise comparisons. SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications, 2019. pp. 432-440 (SIAM International Conference on Data Mining, SDM 2019).
@inproceedings{daf30961c81f4a759e99d1e29e2a092e,
title = "Accelerated experimental design for pairwise comparisons",
abstract = "Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A na{\"i}ve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the na{\"i}ve greedy algorithm on the same dataset would require more than 10 days to terminate.",
author = "Yuan Guo and Jennifer Dy and Deniz Erdogmus and Jayashree Kalpathy-Cramer and Susan Ostmo and John Campbell and Michael Chiang and Stratis Ioannidis",
year = "2019",
month = "1",
day = "1",
language = "English (US)",
series = "SIAM International Conference on Data Mining, SDM 2019",
publisher = "Society for Industrial and Applied Mathematics Publications",
pages = "432--440",
booktitle = "SIAM International Conference on Data Mining, SDM 2019",
address = "United States",

}

TY - GEN

T1 - Accelerated experimental design for pairwise comparisons

AU - Guo, Yuan

AU - Dy, Jennifer

AU - Erdogmus, Deniz

AU - Kalpathy-Cramer, Jayashree

AU - Ostmo, Susan

AU - Campbell, John

AU - Chiang, Michael

AU - Ioannidis, Stratis

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

AB - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

UR - http://www.scopus.com/inward/record.url?scp=85066084095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066084095&partnerID=8YFLogxK

M3 - Conference contribution

T3 - SIAM International Conference on Data Mining, SDM 2019

SP - 432

EP - 440

BT - SIAM International Conference on Data Mining, SDM 2019

PB - Society for Industrial and Applied Mathematics Publications

ER -