TY - GEN
T1 - Accelerated experimental design for pairwise comparisons
AU - Guo, Yuan
AU - Dy, Jennifer
AU - Erdogmus, Deniz
AU - Kalpathy-Cramer, Jayashree
AU - Ostmo, Susan
AU - Campbell, J. Peter
AU - Chiang, Michael
AU - Ioannidis, Stratis
N1 - Funding Information:
Our work is supported by NIH (R01EY019474, P30EY10572), NSF (SCH-1622542 at MGH; SCH-1622536 and CCF-1750539 at Northeastern; SCH-1622679 at OHSU), and by unrestricted departmental funding from Research to Prevent Blindness (OHSU).
Publisher Copyright:
Copyright © 2019 by SIAM.
PY - 2019
Y1 - 2019
N2 - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.
AB - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.
UR - http://www.scopus.com/inward/record.url?scp=85066084095&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066084095&partnerID=8YFLogxK
U2 - 10.1137/1.9781611975673.49
DO - 10.1137/1.9781611975673.49
M3 - Conference contribution
AN - SCOPUS:85066084095
T3 - SIAM International Conference on Data Mining, SDM 2019
SP - 432
EP - 440
BT - SIAM International Conference on Data Mining, SDM 2019
PB - Society for Industrial and Applied Mathematics Publications
T2 - 19th SIAM International Conference on Data Mining, SDM 2019
Y2 - 2 May 2019 through 4 May 2019
ER -