Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection

William (Bill) Hersh, Henning Müller, Jeffery R. Jensen, Jianji Yang, Paul Gorman, Patrick Ruch

Research output: Contribution to journalArticle

59 Citations (Scopus)

Abstract

Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval systems.

Original languageEnglish (US)
Pages (from-to)488-496
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume13
Issue number5
DOIs
StatePublished - Sep 2006

Fingerprint

Research
Language

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Advancing Biomedical Image Retrieval : Development and Analysis of a Test Collection. / Hersh, William (Bill); Müller, Henning; Jensen, Jeffery R.; Yang, Jianji; Gorman, Paul; Ruch, Patrick.

In: Journal of the American Medical Informatics Association, Vol. 13, No. 5, 09.2006, p. 488-496.

Research output: Contribution to journalArticle

Hersh, William (Bill) ; Müller, Henning ; Jensen, Jeffery R. ; Yang, Jianji ; Gorman, Paul ; Ruch, Patrick. / Advancing Biomedical Image Retrieval : Development and Analysis of a Test Collection. In: Journal of the American Medical Informatics Association. 2006 ; Vol. 13, No. 5. pp. 488-496.
@article{c003701526f74a728384bdbbc3347fdc,
title = "Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection",
abstract = "Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval systems.",
author = "Hersh, {William (Bill)} and Henning M{\"u}ller and Jensen, {Jeffery R.} and Jianji Yang and Paul Gorman and Patrick Ruch",
year = "2006",
month = "9",
doi = "10.1197/jamia.M2082",
language = "English (US)",
volume = "13",
pages = "488--496",
journal = "Journal of the American Medical Informatics Association",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Advancing Biomedical Image Retrieval

T2 - Development and Analysis of a Test Collection

AU - Hersh, William (Bill)

AU - Müller, Henning

AU - Jensen, Jeffery R.

AU - Yang, Jianji

AU - Gorman, Paul

AU - Ruch, Patrick

PY - 2006/9

Y1 - 2006/9

N2 - Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval systems.

AB - Objective: Develop and analyze results from an image retrieval test collection. Methods: After participating research groups obtained and assessed results from their systems in the image retrieval task of Cross-Language Evaluation Forum, we assessed the results for common themes and trends. In addition to overall performance, results were analyzed on the basis of topic categories (those most amenable to visual, textual, or mixed approaches) and run categories (those employing queries entered by automated or manual means as well as those using visual, textual, or mixed indexing and retrieval methods). We also assessed results on the different topics and compared the impact of duplicate relevance judgments. Results: A total of 13 research groups participated. Analysis was limited to the best run submitted by each group in each run category. The best results were obtained by systems that combined visual and textual methods. There was substantial variation in performance across topics. Systems employing textual methods were more resilient to visually oriented topics than those using visual methods were to textually oriented topics. The primary performance measure of mean average precision (MAP) was not necessarily associated with other measures, including those possibly more pertinent to real users, such as precision at 10 or 30 images. Conclusions: We developed a test collection amenable to assessing visual and textual methods for image retrieval. Future work must focus on how varying topic and run types affect retrieval performance. Users' studies also are necessary to determine the best measures for evaluating the efficacy of image retrieval systems.

UR - http://www.scopus.com/inward/record.url?scp=33747871929&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33747871929&partnerID=8YFLogxK

U2 - 10.1197/jamia.M2082

DO - 10.1197/jamia.M2082

M3 - Article

C2 - 16799124

AN - SCOPUS:33747871929

VL - 13

SP - 488

EP - 496

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

SN - 1067-5027

IS - 5

ER -