A Day in the Life of PubMed

Analysis of a Typical Day's Query Log

Jorge R. Herskovic, Len Y. Tanaka, William (Bill) Hersh, Elmer V. Bernstam

Research output: Contribution to journalArticle

93 Citations (Scopus)

Abstract

Objective: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. Design: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. Measurements: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. Results: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. Conclusion: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.

Original languageEnglish (US)
Pages (from-to)212-220
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume14
Issue number2
DOIs
StatePublished - Mar 2007

Fingerprint

PubMed
Search Engine
Semantics
Information Systems

ASJC Scopus subject areas

  • Medicine(all)

Cite this

A Day in the Life of PubMed : Analysis of a Typical Day's Query Log. / Herskovic, Jorge R.; Tanaka, Len Y.; Hersh, William (Bill); Bernstam, Elmer V.

In: Journal of the American Medical Informatics Association, Vol. 14, No. 2, 03.2007, p. 212-220.

Research output: Contribution to journalArticle

Herskovic, Jorge R. ; Tanaka, Len Y. ; Hersh, William (Bill) ; Bernstam, Elmer V. / A Day in the Life of PubMed : Analysis of a Typical Day's Query Log. In: Journal of the American Medical Informatics Association. 2007 ; Vol. 14, No. 2. pp. 212-220.
@article{5a9676525c28444c94d04e50fecb7054,
title = "A Day in the Life of PubMed: Analysis of a Typical Day's Query Log",
abstract = "Objective: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. Design: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. Measurements: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. Results: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. Conclusion: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.",
author = "Herskovic, {Jorge R.} and Tanaka, {Len Y.} and Hersh, {William (Bill)} and Bernstam, {Elmer V.}",
year = "2007",
month = "3",
doi = "10.1197/jamia.M2191",
language = "English (US)",
volume = "14",
pages = "212--220",
journal = "Journal of the American Medical Informatics Association",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - A Day in the Life of PubMed

T2 - Analysis of a Typical Day's Query Log

AU - Herskovic, Jorge R.

AU - Tanaka, Len Y.

AU - Hersh, William (Bill)

AU - Bernstam, Elmer V.

PY - 2007/3

Y1 - 2007/3

N2 - Objective: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. Design: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. Measurements: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. Results: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. Conclusion: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.

AB - Objective: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. Design: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. Measurements: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. Results: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. Conclusion: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.

UR - http://www.scopus.com/inward/record.url?scp=33847046721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847046721&partnerID=8YFLogxK

U2 - 10.1197/jamia.M2191

DO - 10.1197/jamia.M2191

M3 - Article

VL - 14

SP - 212

EP - 220

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

SN - 1067-5027

IS - 2

ER -