The Oregon Health & Science University submission to the TREC 2006 Genomics Track approached the question answer extraction task in three phases. In the first phase the biological questions were parsed into relevant entities and query expressions were generated. The second phase retrieved relevant passages from the corpus using Lucene as an information retrieval engine. The third phase performed ranking of the retrieved passages and generated the final submitted output. Through these experiments and comparison with the approaches of others we hope to learn the contribution and value of several techniques applicable to question answer extraction including: lexicon-based query term expansion, query back-off techniques for questions with few applicable passages, and passage clustering for identifying distinct aspects of question answers. Our experiments showed no improvement after cluster-based ranking. Maximal span based passage indexing proved to be too coarse, resulting in an overall average performing passage MAP of 4%.
ASJC Scopus subject areas