UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures

Aaron Cohen, Neil R. Smalheiser

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

The CLEF 2018 Task 2 goal was to identify and rank retrieved articles relevant to conducting a systematic diagnostic test accuracy review on a given topic. The UIC/OHSU team did not attempt to rank retrieved articles by relevance directly, but rather explored the baseline value of ranking retrieved articles according to the probability that they are concerned with diagnostic test accuracy. First, a set of six publication type clusters, including a cluster of diagnostic test accuracy papers (DTAs), was built by searching PubMed from 1987-2015. We created several types of cluster similarity measures for each publication type. Similarity types included: implicit-term similarity, most important word similarity, journal similarity, and author count similarity. These similarity features were then used with weighted and un-weighted linear S VM machine learning algorithms, which were trained with a data set retrieved from PubMed searches consisting of 3481 PMLDS likely to be DTAs, and 71684 PMIDS most of which are not likely to be DTAs. The trained models produce scores predicting the probability that an individual article is a DTA. The CLEF 2018 Task 2 Test PMLDs for each topic were scored and ranked, and the cutoff probability for each of the two models determined by visual inspection of the score distribution on the test data. Cutoff probabilities chosen were 0.20 for the unweighted SVM model and 0.40 for the weighted SVM model.

Original languageEnglish (US)
JournalCEUR Workshop Proceedings
Volume2125
StatePublished - 2018
Event19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France
Duration: Sep 10 2018Sep 14 2018

Keywords

  • Diagnostic Test Accuracy
  • Machine Learning
  • Publication Types
  • Support Vector Machine

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'UIC/OHSU CLEF 2018 Task 2 diagnostic test accuracy ranking using publication type cluster similarity measures'. Together they form a unique fingerprint.

Cite this