TY - GEN

T1 - Accelerated experimental design for pairwise comparisons

AU - Guo, Yuan

AU - Dy, Jennifer

AU - Erdogmus, Deniz

AU - Kalpathy-Cramer, Jayashree

AU - Ostmo, Susan

AU - Campbell, J. Peter

AU - Chiang, Michael F.

AU - Ioannidis, Stratis

N1 - Funding Information:
Our work is supported by NIH (R01EY019474, P30EY10572), NSF (SCH-1622542 at MGH; SCH-1622536 and CCF-1750539 at Northeastern; SCH-1622679 at OHSU), and by unrestricted departmental funding from Research to Prevent Blindness (OHSU).
Publisher Copyright:
Copyright © 2019 by SIAM.

PY - 2019

Y1 - 2019

N2 - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

AB - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

UR - http://www.scopus.com/inward/record.url?scp=85066084095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066084095&partnerID=8YFLogxK

U2 - 10.1137/1.9781611975673.49

DO - 10.1137/1.9781611975673.49

M3 - Conference contribution

AN - SCOPUS:85066084095

T3 - SIAM International Conference on Data Mining, SDM 2019

SP - 432

EP - 440

BT - SIAM International Conference on Data Mining, SDM 2019

PB - Society for Industrial and Applied Mathematics Publications

T2 - 19th SIAM International Conference on Data Mining, SDM 2019

Y2 - 2 May 2019 through 4 May 2019

ER -