A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task.

Hari Krishna Rekapalli, Aaron Cohen, William (Bill) Hersh

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

OBJECTIVE: Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS: A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. RESULTS: Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. CONCLUSION: These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.

Original languageEnglish (US)
Pages (from-to)620-624
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2007

Fingerprint

Genomics
Controlled Vocabulary
Information Storage and Retrieval
Regression Analysis
Genes

ASJC Scopus subject areas

  • Medicine(all)

Cite this

@article{eb5f10585e094ff0b4643ed8f35238dd,
title = "A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task.",
abstract = "OBJECTIVE: Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS: A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. RESULTS: Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. CONCLUSION: These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.",
author = "Rekapalli, {Hari Krishna} and Aaron Cohen and Hersh, {William (Bill)}",
year = "2007",
language = "English (US)",
pages = "620--624",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task.

AU - Rekapalli, Hari Krishna

AU - Cohen, Aaron

AU - Hersh, William (Bill)

PY - 2007

Y1 - 2007

N2 - OBJECTIVE: Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS: A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. RESULTS: Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. CONCLUSION: These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.

AB - OBJECTIVE: Identify the set of features that best explained the variation in the performance measure of TREC 2006 Genomics information extraction task, Mean Average Passage Precision (MAPP). METHODS: A multivariate regression model was built using a backward-elimination approach as a function of certain generalized features that were common to all the algorithms used by TREC 2006 Genomics track participants. RESULTS: Our regression analysis found that the following four factors were collectively associated with variation in MAPP: (1) Normalization of keywords in the query (2) Use of Entrez gene thesaurus for synonymous terms look-up (3) Unit of text retrieved using respective IR algorithms and (4) The way a passage was defined. CONCLUSION: These reasonably likely hypotheses, generated by an exploratory data analysis, are informative in understanding results of the TREC 2006 Genomics passage extraction task. This approach has general value for analyzing the results of similar common challenge tasks.

UR - http://www.scopus.com/inward/record.url?scp=56149108665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56149108665&partnerID=8YFLogxK

M3 - Article

C2 - 18693910

AN - SCOPUS:56149108665

SP - 620

EP - 624

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -