Using citation data to improve retrieval from MEDLINE

Elmer V. Bernstam; Jorge R. Herskovic; Yindalon Aphinyanaphongs; Constantin F. Aliferis; Madurai G. Sriram; William R. Hersh

doi:10.1197/jamia.M1909

Using citation data to improve retrieval from MEDLINE

Elmer V. Bernstam, Jorge R. Herskovic, Yindalon Aphinyanaphongs, Constantin F. Aliferis, Madurai G. Sriram, William R. Hersh

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

48 Scopus citations

Abstract

Objective: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. Design and Measurements: A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. Results: Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. Conclusion: Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.

Original language	English (US)
Pages (from-to)	96-105
Number of pages	10
Journal	Journal of the American Medical Informatics Association
Volume	13
Issue number	1
DOIs	https://doi.org/10.1197/jamia.M1909
State	Published - Jan 2006

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1197/jamia.M1909

Cite this

@article{cc28581733be4a71b35e3c29b062410a,

title = "Using citation data to improve retrieval from MEDLINE",

abstract = "Objective: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. Design and Measurements: A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. Results: Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. Conclusion: Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.",

author = "Bernstam, {Elmer V.} and Herskovic, {Jorge R.} and Yindalon Aphinyanaphongs and Aliferis, {Constantin F.} and Sriram, {Madurai G.} and Hersh, {William R.}",

note = "Funding Information: Supported in part by NLM grant 5 K22 LM008306 and a training fellowship from the W. M. Keck Foundation to the Gulf Coast Consortia through the Keck Center for Computational and Structural Biology. ",

year = "2006",

month = jan,

doi = "10.1197/jamia.M1909",

language = "English (US)",

volume = "13",

pages = "96--105",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Using citation data to improve retrieval from MEDLINE

AU - Bernstam, Elmer V.

AU - Herskovic, Jorge R.

AU - Aphinyanaphongs, Yindalon

AU - Aliferis, Constantin F.

AU - Sriram, Madurai G.

AU - Hersh, William R.

N1 - Funding Information: Supported in part by NLM grant 5 K22 LM008306 and a training fellowship from the W. M. Keck Foundation to the Gulf Coast Consortia through the Keck Center for Computational and Structural Biology.

PY - 2006/1

Y1 - 2006/1

N2 - Objective: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. Design and Measurements: A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. Results: Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. Conclusion: Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.

AB - Objective: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. Design and Measurements: A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. Results: Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. Conclusion: Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.

UR - http://www.scopus.com/inward/record.url?scp=29244472467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29244472467&partnerID=8YFLogxK

U2 - 10.1197/jamia.M1909

DO - 10.1197/jamia.M1909

M3 - Article

C2 - 16221938

AN - SCOPUS:29244472467

SN - 1067-5027

VL - 13

SP - 96

EP - 105

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 1

ER -

Using citation data to improve retrieval from MEDLINE

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this