Accelerated experimental design for pairwise comparisons

Yuan Guo; Jennifer Dy; Deniz Erdogmus; Jayashree Kalpathy-Cramer; Susan Ostmo; J. Peter Campbell; Michael F. Chiang; Stratis Ioannidis

doi:10.1137/1.9781611975673.49

Accelerated experimental design for pairwise comparisons

Yuan Guo, Jennifer Dy, Deniz Erdogmus, Jayashree Kalpathy-Cramer, Susan Ostmo, J. Peter Campbell, Michael F. Chiang, Stratis Ioannidis

Ophthalmology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N²d²K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N²(K + d) + N(dK + d²) + d²K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 10⁸ comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

Original language	English (US)
Title of host publication	SIAM International Conference on Data Mining, SDM 2019
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	432-440
Number of pages	9
ISBN (Electronic)	9781611975673
DOIs	https://doi.org/10.1137/1.9781611975673.49
State	Published - 2019
Event	19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada Duration: May 2 2019 → May 4 2019

Publication series

Name	SIAM International Conference on Data Mining, SDM 2019

Conference

Conference	19th SIAM International Conference on Data Mining, SDM 2019
Country/Territory	Canada
City	Calgary
Period	5/2/19 → 5/4/19

ASJC Scopus subject areas

Software

Access to Document

10.1137/1.9781611975673.49

Cite this

Guo, Y., Dy, J., Erdogmus, D., Kalpathy-Cramer, J., Ostmo, S., Campbell, J. P., Chiang, M. F., & Ioannidis, S. (2019). Accelerated experimental design for pairwise comparisons. In SIAM International Conference on Data Mining, SDM 2019 (pp. 432-440). (SIAM International Conference on Data Mining, SDM 2019). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975673.49

Accelerated experimental design for pairwise comparisons. / Guo, Yuan; Dy, Jennifer; Erdogmus, Deniz et al.
SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications, 2019. p. 432-440 (SIAM International Conference on Data Mining, SDM 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Guo, Y, Dy, J, Erdogmus, D, Kalpathy-Cramer, J, Ostmo, S, Campbell, JP, Chiang, MF & Ioannidis, S 2019, Accelerated experimental design for pairwise comparisons. in SIAM International Conference on Data Mining, SDM 2019. SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics Publications, pp. 432-440, 19th SIAM International Conference on Data Mining, SDM 2019, Calgary, Canada, 5/2/19. https://doi.org/10.1137/1.9781611975673.49

@inproceedings{daf30961c81f4a759e99d1e29e2a092e,

title = "Accelerated experimental design for pairwise comparisons",

abstract = "Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A na{\"i}ve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm{\textquoteright}s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the na{\"i}ve greedy algorithm on the same dataset would require more than 10 days to terminate.",

author = "Yuan Guo and Jennifer Dy and Deniz Erdogmus and Jayashree Kalpathy-Cramer and Susan Ostmo and Campbell, {J. Peter} and Chiang, {Michael F.} and Stratis Ioannidis",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 by SIAM.; 19th SIAM International Conference on Data Mining, SDM 2019 ; Conference date: 02-05-2019 Through 04-05-2019",

year = "2019",

doi = "10.1137/1.9781611975673.49",

language = "English (US)",

series = "SIAM International Conference on Data Mining, SDM 2019",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "432--440",

booktitle = "SIAM International Conference on Data Mining, SDM 2019",

address = "United States",

}

TY - GEN

T1 - Accelerated experimental design for pairwise comparisons

AU - Guo, Yuan

AU - Dy, Jennifer

AU - Erdogmus, Deniz

AU - Kalpathy-Cramer, Jayashree

AU - Ostmo, Susan

AU - Campbell, J. Peter

AU - Chiang, Michael F.

AU - Ioannidis, Stratis

PY - 2019

Y1 - 2019

N2 - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

AB - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

UR - http://www.scopus.com/inward/record.url?scp=85066084095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066084095&partnerID=8YFLogxK

U2 - 10.1137/1.9781611975673.49

DO - 10.1137/1.9781611975673.49

M3 - Conference contribution

AN - SCOPUS:85066084095

T3 - SIAM International Conference on Data Mining, SDM 2019

SP - 432

EP - 440

BT - SIAM International Conference on Data Mining, SDM 2019

PB - Society for Industrial and Applied Mathematics Publications

T2 - 19th SIAM International Conference on Data Mining, SDM 2019

Y2 - 2 May 2019 through 4 May 2019

ER -

Accelerated experimental design for pairwise comparisons

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this