### Abstract

Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N^{2}d^{2}K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N^{2}(K + d) + N(dK + d^{2}) + d^{2}K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 10^{8} comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

Original language | English (US) |
---|---|

Title of host publication | SIAM International Conference on Data Mining, SDM 2019 |

Publisher | Society for Industrial and Applied Mathematics Publications |

Pages | 432-440 |

Number of pages | 9 |

ISBN (Electronic) | 9781611975673 |

State | Published - Jan 1 2019 |

Event | 19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada Duration: May 2 2019 → May 4 2019 |

### Publication series

Name | SIAM International Conference on Data Mining, SDM 2019 |
---|

### Conference

Conference | 19th SIAM International Conference on Data Mining, SDM 2019 |
---|---|

Country | Canada |

City | Calgary |

Period | 5/2/19 → 5/4/19 |

### Fingerprint

### ASJC Scopus subject areas

- Software

### Cite this

*SIAM International Conference on Data Mining, SDM 2019*(pp. 432-440). (SIAM International Conference on Data Mining, SDM 2019). Society for Industrial and Applied Mathematics Publications.

**Accelerated experimental design for pairwise comparisons.** / Guo, Yuan; Dy, Jennifer; Erdogmus, Deniz; Kalpathy-Cramer, Jayashree; Ostmo, Susan; Campbell, John; Chiang, Michael; Ioannidis, Stratis.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*SIAM International Conference on Data Mining, SDM 2019.*SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics Publications, pp. 432-440, 19th SIAM International Conference on Data Mining, SDM 2019, Calgary, Canada, 5/2/19.

}

TY - GEN

T1 - Accelerated experimental design for pairwise comparisons

AU - Guo, Yuan

AU - Dy, Jennifer

AU - Erdogmus, Deniz

AU - Kalpathy-Cramer, Jayashree

AU - Ostmo, Susan

AU - Campbell, John

AU - Chiang, Michael

AU - Ioannidis, Stratis

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

AB - Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N2d2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N2(K + d) + N(dK + d2) + d2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 108 comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

UR - http://www.scopus.com/inward/record.url?scp=85066084095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066084095&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85066084095

T3 - SIAM International Conference on Data Mining, SDM 2019

SP - 432

EP - 440

BT - SIAM International Conference on Data Mining, SDM 2019

PB - Society for Industrial and Applied Mathematics Publications

ER -