Overview of the CLEF 2009 medical image retrieval track

Henning Müller, Jayashree Kalpathy-Cramer, Ivan Eggel, Steven Bedrick, Säd Radhouani, Brian Bakke, Charles E. Kahn, William (Bill) Hersh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

Original languageEnglish (US)
Title of host publicationCLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009
PublisherCEUR-WS
Volume1175
StatePublished - 2009
Event2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009 - Corfu, Greece
Duration: Sep 30 2009Oct 2 2009

Other

Other2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009
CountryGreece
CityCorfu
Period9/30/0910/2/09

Fingerprint

Radiology
Image retrieval
Feedback
Imaging techniques
Industry

Keywords

  • Image retrieval
  • Medical image retrieval
  • Multimodal retrieval

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Müller, H., Kalpathy-Cramer, J., Eggel, I., Bedrick, S., Radhouani, S., Bakke, B., ... Hersh, W. B. (2009). Overview of the CLEF 2009 medical image retrieval track. In CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009 (Vol. 1175). CEUR-WS.

Overview of the CLEF 2009 medical image retrieval track. / Müller, Henning; Kalpathy-Cramer, Jayashree; Eggel, Ivan; Bedrick, Steven; Radhouani, Säd; Bakke, Brian; Kahn, Charles E.; Hersh, William (Bill).

CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009. Vol. 1175 CEUR-WS, 2009.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Müller, H, Kalpathy-Cramer, J, Eggel, I, Bedrick, S, Radhouani, S, Bakke, B, Kahn, CE & Hersh, WB 2009, Overview of the CLEF 2009 medical image retrieval track. in CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009. vol. 1175, CEUR-WS, 2009 Cross Language Evaluation Forum Workshop, CLEF 2009, co-located with the 13th European Conference on Digital Libraries, ECDL 2009, Corfu, Greece, 9/30/09.
Müller H, Kalpathy-Cramer J, Eggel I, Bedrick S, Radhouani S, Bakke B et al. Overview of the CLEF 2009 medical image retrieval track. In CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009. Vol. 1175. CEUR-WS. 2009
Müller, Henning ; Kalpathy-Cramer, Jayashree ; Eggel, Ivan ; Bedrick, Steven ; Radhouani, Säd ; Bakke, Brian ; Kahn, Charles E. ; Hersh, William (Bill). / Overview of the CLEF 2009 medical image retrieval track. CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009. Vol. 1175 CEUR-WS, 2009.
@inproceedings{d6eb6594f19f47c29d5ac6eea519984f,
title = "Overview of the CLEF 2009 medical image retrieval track",
abstract = "2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.",
keywords = "Image retrieval, Medical image retrieval, Multimodal retrieval",
author = "Henning M{\"u}ller and Jayashree Kalpathy-Cramer and Ivan Eggel and Steven Bedrick and S{\"a}d Radhouani and Brian Bakke and Kahn, {Charles E.} and Hersh, {William (Bill)}",
year = "2009",
language = "English (US)",
volume = "1175",
booktitle = "CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Overview of the CLEF 2009 medical image retrieval track

AU - Müller, Henning

AU - Kalpathy-Cramer, Jayashree

AU - Eggel, Ivan

AU - Bedrick, Steven

AU - Radhouani, Säd

AU - Bakke, Brian

AU - Kahn, Charles E.

AU - Hersh, William (Bill)

PY - 2009

Y1 - 2009

N2 - 2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

AB - 2009 was the sixth year for the ImageCLEF medical retrieval task. Participation was strong again with 38 registered research groups. 17 groups submitted runs and thus participated actively in the tasks. The database in 2009 was similar to the one used in 2008, containing scientific articles from two radiology journals, Radiology and Radiographics. The size of the database was increased to a total of 74,902 images. For each image, captions and access to the full text article through the Medline PMID (PubMed Identifier) were provided. An article's PMID could be used to obtain the officially assignedMeSH (Medical Subject Headings) terms. The collection was entirely in English. However, the topics were, as in previous years, supplied in German, French, and English. Twenty-five image-based topics were provided, of which ten each were visual and mixed and five were textual. In addition, for the first time, 5 case-based topics were provided as an exploratory task. Here the unit of retrieval was intended to be the article and not the image. Case-based topics are designed to be a step closer to the clinical workflow. Clinicians often seek information about patient cases with incomplete information consisting of symptoms, findings, and a set of images. Supplying cases to a clinician from the scientific literature that are similar to the case (s)he is treating can be an important application of image retrieval in the future. As in previous years, most groups concentrated on fully automatic retrieval. However, four groups submitted a total of seven manual or interactive runs. The interactive runs submitted this year performed quite well compared to previous years but did not show a substantial increase in performance over the automatic approaches. In previous years, multimodal combinations were the most frequent submissions. However, this year, as in 2008 only about half as many mixed runs as purely textual runs were submitted. Very few fully visual runs were submitted, and again, the ones submitted performed poorly. The best mean average precisions (MAP) were obtained using automatic textual methods. There were mixed feedback runs that had high MAP. The best early precision was also obtained using automatic textual methods, with a few mixed automatic runs also doing well. We had the opportunity to perform multiple judgments on some topics. The kappas used as the metric for inter-rater agreement were mostly quite high (¿0.7). However, one of our judges consistently had low kappas as he was significantly more lenient the colleagues. We evaluated the overall performance of groups using strict and lenient judges and found that there was high correlation even though the absolute values for the metrics were different. We also introduced a lung nodule detection task in 2009. This task used the CT slices from the Lung Imaging Data Consortium (LIDC) which included ground truth in the form of manual annotations. The goal of the task was to create algorithms to automatically detect lung nodules. Although there seemed to be significant interest in the task as evidenced by the substantial number of registrations, only two groups submitted results with a proprietary software from a industry participant achieving impressive results.

KW - Image retrieval

KW - Medical image retrieval

KW - Multimodal retrieval

UR - http://www.scopus.com/inward/record.url?scp=84922051601&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051601&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84922051601

VL - 1175

BT - CLEF 2009 - Working Notes for CLEF 2009 Workshop, co-located with the 13th European Conference on Digital Libraries, ECDL 2009

PB - CEUR-WS

ER -