Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013

Steven Bedrick, Golnar Sheikshabbafghi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

Original languageEnglish (US)
Title of host publicationCLEF 2013 - Working Notes for CLEF 2013 Conference
PublisherCEUR-WS
Volume1179
StatePublished - 2013
Event2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain
Duration: Sep 23 2013Sep 26 2013

Other

Other2013 Cross Language Evaluation Forum Conference, CLEF 2013
CountrySpain
CityValencia
Period9/23/139/26/13

Fingerprint

Medicine
Health

Keywords

  • Language model
  • Lucene
  • MetaMap
  • Skip-grams

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Bedrick, S., & Sheikshabbafghi, G. (2013). Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013. In CLEF 2013 - Working Notes for CLEF 2013 Conference (Vol. 1179). CEUR-WS.

Lucene, MetaMap, and language modeling : OHSU at CLEF eHealth 2013. / Bedrick, Steven; Sheikshabbafghi, Golnar.

CLEF 2013 - Working Notes for CLEF 2013 Conference. Vol. 1179 CEUR-WS, 2013.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bedrick, S & Sheikshabbafghi, G 2013, Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013. in CLEF 2013 - Working Notes for CLEF 2013 Conference. vol. 1179, CEUR-WS, 2013 Cross Language Evaluation Forum Conference, CLEF 2013, Valencia, Spain, 9/23/13.
Bedrick S, Sheikshabbafghi G. Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013. In CLEF 2013 - Working Notes for CLEF 2013 Conference. Vol. 1179. CEUR-WS. 2013
Bedrick, Steven ; Sheikshabbafghi, Golnar. / Lucene, MetaMap, and language modeling : OHSU at CLEF eHealth 2013. CLEF 2013 - Working Notes for CLEF 2013 Conference. Vol. 1179 CEUR-WS, 2013.
@inproceedings{a72a2d7ee5904268b4dcd88912437cb2,
title = "Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013",
abstract = "The Oregon Health & Science University team's participation in task #3 ({"}addressing patients' medical questions{"}) of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.",
keywords = "Language model, Lucene, MetaMap, Skip-grams",
author = "Steven Bedrick and Golnar Sheikshabbafghi",
year = "2013",
language = "English (US)",
volume = "1179",
booktitle = "CLEF 2013 - Working Notes for CLEF 2013 Conference",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Lucene, MetaMap, and language modeling

T2 - OHSU at CLEF eHealth 2013

AU - Bedrick, Steven

AU - Sheikshabbafghi, Golnar

PY - 2013

Y1 - 2013

N2 - The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

AB - The Oregon Health & Science University team's participation in task #3 ("addressing patients' medical questions") of this year's eHealth CLEF campaign included submissions from two different retrieval systems. The first was a traditional, Lucene-based system modi fied from one used in previous years' TREC-med campaigns; the second was a novel system that used statistical language modeling techniques to perform text retrieval. Since 2013 was the first year of our participation in this campaign, our focus was on familiarizing ourselves with working on a corpus of web text, as well as putting together a proof-of-concept implementation of a language-model retrieval system. We submitted three runs in total; one from the novel system, and two from our Lucene-based system, one of which made use of the National Library of Medicine's MetaMap tool to perform query expansion. In general, our runs did not perform particularly well, although there were several topics for which our language model-based retrieval system produced the best P@10. Future work will focus on pre-indexing text normalization as well as a more sophisticated approach to query parsing.

KW - Language model

KW - Lucene

KW - MetaMap

KW - Skip-grams

UR - http://www.scopus.com/inward/record.url?scp=84922021504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922021504&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84922021504

VL - 1179

BT - CLEF 2013 - Working Notes for CLEF 2013 Conference

PB - CEUR-WS

ER -