### Abstract

Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which K pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has O(N^{2}d^{2}K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset–namely, that it consists of pairwise comparisons–the greedy algorithm’s complexity can be reduced to O(N^{2}(K + d) + N(dK + d^{2}) + d^{2}K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 10^{8} comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

Original language | English (US) |
---|---|

Title of host publication | SIAM International Conference on Data Mining, SDM 2019 |

Publisher | Society for Industrial and Applied Mathematics Publications |

Pages | 432-440 |

Number of pages | 9 |

ISBN (Electronic) | 9781611975673 |

DOIs | |

State | Published - Jan 1 2019 |

Event | 19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada Duration: May 2 2019 → May 4 2019 |

### Publication series

Name | SIAM International Conference on Data Mining, SDM 2019 |
---|

### Conference

Conference | 19th SIAM International Conference on Data Mining, SDM 2019 |
---|---|

Country | Canada |

City | Calgary |

Period | 5/2/19 → 5/4/19 |

### ASJC Scopus subject areas

- Software

## Fingerprint Dive into the research topics of 'Accelerated experimental design for pairwise comparisons'. Together they form a unique fingerprint.

## Cite this

*SIAM International Conference on Data Mining, SDM 2019*(pp. 432-440). (SIAM International Conference on Data Mining, SDM 2019). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975673.49