Aggregator

a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial

Weixiang Shao, Clive E. Adams, Aaron Cohen, John M. Davis, Marian McDonagh, Sujata Thakurta, Philip S. Yu, Neil R. Smalheiser

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.

METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.

RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.

DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.

Original languageEnglish (US)
Pages (from-to)65-70
Number of pages6
JournalMethods (San Diego, Calif.)
Volume74
DOIs
StatePublished - Mar 1 2015

Fingerprint

Metadata
MEDLINE
Learning systems
Clinical Trials
PubMed
Logistics
Cluster Analysis
Registries
Publications
Logistic Models
Machine Learning

Keywords

  • Bias
  • Clinical trials
  • Evidence-based medicine
  • Informatics
  • Information retrieval
  • Systematic reviews

ASJC Scopus subject areas

  • Molecular Biology
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Aggregator : a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial. / Shao, Weixiang; Adams, Clive E.; Cohen, Aaron; Davis, John M.; McDonagh, Marian; Thakurta, Sujata; Yu, Philip S.; Smalheiser, Neil R.

In: Methods (San Diego, Calif.), Vol. 74, 01.03.2015, p. 65-70.

Research output: Contribution to journalArticle

Shao, Weixiang ; Adams, Clive E. ; Cohen, Aaron ; Davis, John M. ; McDonagh, Marian ; Thakurta, Sujata ; Yu, Philip S. ; Smalheiser, Neil R. / Aggregator : a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial. In: Methods (San Diego, Calif.). 2015 ; Vol. 74. pp. 65-70.
@article{3e60fa001fb94284bed462362c428b1d,
title = "Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial",
abstract = "OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.",
keywords = "Bias, Clinical trials, Evidence-based medicine, Informatics, Information retrieval, Systematic reviews",
author = "Weixiang Shao and Adams, {Clive E.} and Aaron Cohen and Davis, {John M.} and Marian McDonagh and Sujata Thakurta and Yu, {Philip S.} and Smalheiser, {Neil R.}",
year = "2015",
month = "3",
day = "1",
doi = "10.1016/j.ymeth.2014.11.006",
language = "English (US)",
volume = "74",
pages = "65--70",
journal = "ImmunoMethods",
issn = "1046-2023",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Aggregator

T2 - a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial

AU - Shao, Weixiang

AU - Adams, Clive E.

AU - Cohen, Aaron

AU - Davis, John M.

AU - McDonagh, Marian

AU - Thakurta, Sujata

AU - Yu, Philip S.

AU - Smalheiser, Neil R.

PY - 2015/3/1

Y1 - 2015/3/1

N2 - OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.

AB - OBJECTIVE: It is important to identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.METHODS: We created positive and negative training sets (comprised of pairs of articles reporting on the same condition and intervention) that were, or were not, linked to the same clinicaltrials.gov trial registry number. Features were extracted from MEDLINE and PubMed metadata; pairwise similarity scores were modeled using logistic regression.RESULTS: Article pairs from the same trial were identified with high accuracy (F1 score=0.843). We also created a clustering tool, Aggregator, that takes as input a PubMed user query for RCTs on a given topic, and returns article clusters predicted to arise from the same clinical trial.DISCUSSION: Although painstaking examination of full-text may be needed to be conclusive, metadata are surprisingly accurate in predicting when two articles derive from the same underlying clinical trial.

KW - Bias

KW - Clinical trials

KW - Evidence-based medicine

KW - Informatics

KW - Information retrieval

KW - Systematic reviews

UR - http://www.scopus.com/inward/record.url?scp=85003045341&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85003045341&partnerID=8YFLogxK

U2 - 10.1016/j.ymeth.2014.11.006

DO - 10.1016/j.ymeth.2014.11.006

M3 - Article

VL - 74

SP - 65

EP - 70

JO - ImmunoMethods

JF - ImmunoMethods

SN - 1046-2023

ER -