Improving retrieval using external annotations: OHSU at imageCLEF 2010

Steven Bedrick; Jayashree Kalpathy-Cramer

Improving retrieval using external annotations: OHSU at imageCLEF 2010

Steven Bedrick, Jayashree Kalpathy-Cramer

Research output: Contribution to journal › Conference article › peer-review

Abstract

Over the past several years, our team has focused its efforts on improving retrieval precision performance by mixing visual and textual information. This year, we chose to explore ways in which we could use external data to enrich our retrieval system's data set; specifically, we annotated each image in the test collection with a set of MeSH headings from two different sources: human-assigned MEDLINE index terms, and automatically-assigned MeSH headings (via the National Library of Medicine's MetaMap software). In addition to exploring these different data enrichment techniques, we also revamped the architecture of our retrieval system itself. In past years, we have used a two-tiered approach wherein the data is stored in a relational database (RDBMS), but the indexing and searching are done using Lucene-like system. This year, we took advantage of our RDBMS's full-text search capabilities and performed both storage and searching in the RDBMS. This turned out to have both positive and negative effects at a practical level. On the one hand, using the database's built-in text retrieval subsystem resulted in improved retrieval speed and easier query analysis; however, these gains came at the cost of reduced exibility and increased code complexity. Our experiments investigated the effects of using various combinations of human- and automatically-assigned MeSH terms, along with several of the techniques that have proved useful in previous years. We found that including automatically-assigned MeSH terms sometimes provided a small amount of improvement (in terms of bpref, MAP, and early precision) and sometimes hurt performance, whereas including the humanassigned MEDLINE index headings consistently yielded a sizable improvement in those same metrics.

Original language	English (US)
Journal	CEUR Workshop Proceedings
Volume	1176
State	Published - 2010
Event	2010 Cross Language Evaluation Forum Conference, CLEF 2010 - Padua, Italy Duration: Sep 22 2010 → Sep 23 2010

ASJC Scopus subject areas

General Computer Science

Cite this

@article{c812662ee0ed4836a6be825dcfc6f22c,

title = "Improving retrieval using external annotations: OHSU at imageCLEF 2010",

abstract = "Over the past several years, our team has focused its efforts on improving retrieval precision performance by mixing visual and textual information. This year, we chose to explore ways in which we could use external data to enrich our retrieval system's data set; specifically, we annotated each image in the test collection with a set of MeSH headings from two different sources: human-assigned MEDLINE index terms, and automatically-assigned MeSH headings (via the National Library of Medicine's MetaMap software). In addition to exploring these different data enrichment techniques, we also revamped the architecture of our retrieval system itself. In past years, we have used a two-tiered approach wherein the data is stored in a relational database (RDBMS), but the indexing and searching are done using Lucene-like system. This year, we took advantage of our RDBMS's full-text search capabilities and performed both storage and searching in the RDBMS. This turned out to have both positive and negative effects at a practical level. On the one hand, using the database's built-in text retrieval subsystem resulted in improved retrieval speed and easier query analysis; however, these gains came at the cost of reduced exibility and increased code complexity. Our experiments investigated the effects of using various combinations of human- and automatically-assigned MeSH terms, along with several of the techniques that have proved useful in previous years. We found that including automatically-assigned MeSH terms sometimes provided a small amount of improvement (in terms of bpref, MAP, and early precision) and sometimes hurt performance, whereas including the humanassigned MEDLINE index headings consistently yielded a sizable improvement in those same metrics.",

author = "Steven Bedrick and Jayashree Kalpathy-Cramer",

year = "2010",

language = "English (US)",

volume = "1176",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2010 Cross Language Evaluation Forum Conference, CLEF 2010 ; Conference date: 22-09-2010 Through 23-09-2010",

}

TY - JOUR

T1 - Improving retrieval using external annotations

T2 - 2010 Cross Language Evaluation Forum Conference, CLEF 2010

AU - Bedrick, Steven

AU - Kalpathy-Cramer, Jayashree

PY - 2010

Y1 - 2010

N2 - Over the past several years, our team has focused its efforts on improving retrieval precision performance by mixing visual and textual information. This year, we chose to explore ways in which we could use external data to enrich our retrieval system's data set; specifically, we annotated each image in the test collection with a set of MeSH headings from two different sources: human-assigned MEDLINE index terms, and automatically-assigned MeSH headings (via the National Library of Medicine's MetaMap software). In addition to exploring these different data enrichment techniques, we also revamped the architecture of our retrieval system itself. In past years, we have used a two-tiered approach wherein the data is stored in a relational database (RDBMS), but the indexing and searching are done using Lucene-like system. This year, we took advantage of our RDBMS's full-text search capabilities and performed both storage and searching in the RDBMS. This turned out to have both positive and negative effects at a practical level. On the one hand, using the database's built-in text retrieval subsystem resulted in improved retrieval speed and easier query analysis; however, these gains came at the cost of reduced exibility and increased code complexity. Our experiments investigated the effects of using various combinations of human- and automatically-assigned MeSH terms, along with several of the techniques that have proved useful in previous years. We found that including automatically-assigned MeSH terms sometimes provided a small amount of improvement (in terms of bpref, MAP, and early precision) and sometimes hurt performance, whereas including the humanassigned MEDLINE index headings consistently yielded a sizable improvement in those same metrics.

AB - Over the past several years, our team has focused its efforts on improving retrieval precision performance by mixing visual and textual information. This year, we chose to explore ways in which we could use external data to enrich our retrieval system's data set; specifically, we annotated each image in the test collection with a set of MeSH headings from two different sources: human-assigned MEDLINE index terms, and automatically-assigned MeSH headings (via the National Library of Medicine's MetaMap software). In addition to exploring these different data enrichment techniques, we also revamped the architecture of our retrieval system itself. In past years, we have used a two-tiered approach wherein the data is stored in a relational database (RDBMS), but the indexing and searching are done using Lucene-like system. This year, we took advantage of our RDBMS's full-text search capabilities and performed both storage and searching in the RDBMS. This turned out to have both positive and negative effects at a practical level. On the one hand, using the database's built-in text retrieval subsystem resulted in improved retrieval speed and easier query analysis; however, these gains came at the cost of reduced exibility and increased code complexity. Our experiments investigated the effects of using various combinations of human- and automatically-assigned MeSH terms, along with several of the techniques that have proved useful in previous years. We found that including automatically-assigned MeSH terms sometimes provided a small amount of improvement (in terms of bpref, MAP, and early precision) and sometimes hurt performance, whereas including the humanassigned MEDLINE index headings consistently yielded a sizable improvement in those same metrics.

UR - http://www.scopus.com/inward/record.url?scp=84922051412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922051412&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84922051412

SN - 1613-0073

VL - 1176

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 22 September 2010 through 23 September 2010

ER -

Improving retrieval using external annotations: OHSU at imageCLEF 2010

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this