Why batch and user evaluations do not give the same results

A. H. Turpin; W. Hersh

doi:10.1145/383952.383992

Why batch and user evaluations do not give the same results

A. H. Turpin, W. Hersh

Research output: Contribution to journal › Conference article › peer-review

113 Scopus citations

Abstract

Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.

Original language	English (US)
Pages (from-to)	225-231
Number of pages	7
Journal	SIGIR Forum (ACM Special Interest Group on Information Retrieval)
DOIs	https://doi.org/10.1145/383952.383992
State	Published - 2001
Externally published	Yes
Event	24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - New Orleans, LA, United States Duration: Sep 9 2001 → Sep 13 2001

Keywords

Information retrieval evaluation
Interactive retrieval
Text Retrieval Conference (TREC)

ASJC Scopus subject areas

Management Information Systems
Hardware and Architecture

Access to Document

10.1145/383952.383992

Cite this

@article{fb1076b53bab42c79db368d97cde95c9,

title = "Why batch and user evaluations do not give the same results",

abstract = "Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.",

keywords = "Information retrieval evaluation, Interactive retrieval, Text Retrieval Conference (TREC)",

author = "Turpin, {A. H.} and W. Hersh",

year = "2001",

doi = "10.1145/383952.383992",

language = "English (US)",

pages = "225--231",

journal = "SIGIR Forum (ACM Special Interest Group on Information Retrieval)",

issn = "0163-5840",

publisher = "Association for Computing Machinery (ACM)",

note = "24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ; Conference date: 09-09-2001 Through 13-09-2001",

}

TY - JOUR

T1 - Why batch and user evaluations do not give the same results

AU - Turpin, A. H.

AU - Hersh, W.

PY - 2001

Y1 - 2001

N2 - Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.

AB - Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.

KW - Information retrieval evaluation

KW - Interactive retrieval

KW - Text Retrieval Conference (TREC)

UR - http://www.scopus.com/inward/record.url?scp=0034788434&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034788434&partnerID=8YFLogxK

U2 - 10.1145/383952.383992

DO - 10.1145/383952.383992

M3 - Conference article

AN - SCOPUS:0034788434

SN - 0163-5840

SP - 225

EP - 231

JO - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

JF - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

T2 - 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Y2 - 9 September 2001 through 13 September 2001

ER -

Why batch and user evaluations do not give the same results

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this