Evaluation-as-a-service for the computational sciences: Overview and outlook

Frank Hopfgartner; Allan Hanbury; Henning MÜLler; Ivan Eggel; Krisztian Balog; Torben Brodt; Gordon V. Cormack; Jimmy Lin; Jayashree Kalpathy-Cramer; Noriko Kando; Makoto P. Kato; Anastasia Krithara; Tim Gollub; Martin Potthast; Evelyne Viegas; Simon Mercer

doi:10.1145/3239570

Evaluation-as-a-service for the computational sciences: Overview and outlook

Frank Hopfgartner, Allan Hanbury, Henning MÜLler, Ivan Eggel, Krisztian Balog, Torben Brodt, Gordon V. Cormack, Jimmy Lin, Jayashree Kalpathy-Cramer, Noriko Kando, Makoto P. Kato, Anastasia Krithara, Tim Gollub, Martin Potthast, Evelyne Viegas, Simon Mercer

Research output: Contribution to journal › Review article › peer-review

16 Scopus citations

Abstract

Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.

Original language	English (US)
Article number	a15
Journal	Journal of Data and Information Quality
Volume	10
Issue number	4
DOIs	https://doi.org/10.1145/3239570
State	Published - Oct 2018
Externally published	Yes

Keywords

Benchmarking
Evaluation-as-a-service
Information access systems

ASJC Scopus subject areas

Information Systems
Information Systems and Management

Access to Document

10.1145/3239570

Cite this

Hopfgartner, F., Hanbury, A., MÜLler, H., Eggel, I., Balog, K., Brodt, T., Cormack, G. V., Lin, J., Kalpathy-Cramer, J., Kando, N., Kato, M. P., Krithara, A., Gollub, T., Potthast, M., Viegas, E., & Mercer, S. (2018). Evaluation-as-a-service for the computational sciences: Overview and outlook. Journal of Data and Information Quality, 10(4), Article a15. https://doi.org/10.1145/3239570

@article{6ed4461dd74341688f4e3bd696687cff,

title = "Evaluation-as-a-service for the computational sciences: Overview and outlook",

abstract = "Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.",

keywords = "Benchmarking, Evaluation-as-a-service, Information access systems",

author = "Frank Hopfgartner and Allan Hanbury and Henning M{\"U}Ller and Ivan Eggel and Krisztian Balog and Torben Brodt and Cormack, {Gordon V.} and Jimmy Lin and Jayashree Kalpathy-Cramer and Noriko Kando and Kato, {Makoto P.} and Anastasia Krithara and Tim Gollub and Martin Potthast and Evelyne Viegas and Simon Mercer",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.",

year = "2018",

month = oct,

doi = "10.1145/3239570",

language = "English (US)",

volume = "10",

journal = "Journal of Data and Information Quality",

issn = "1936-1955",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

TY - JOUR

T1 - Evaluation-as-a-service for the computational sciences

T2 - Overview and outlook

AU - Hopfgartner, Frank

AU - Hanbury, Allan

AU - MÜLler, Henning

AU - Eggel, Ivan

AU - Balog, Krisztian

AU - Brodt, Torben

AU - Cormack, Gordon V.

AU - Lin, Jimmy

AU - Kalpathy-Cramer, Jayashree

AU - Kando, Noriko

AU - Kato, Makoto P.

AU - Krithara, Anastasia

AU - Gollub, Tim

AU - Potthast, Martin

AU - Viegas, Evelyne

AU - Mercer, Simon

PY - 2018/10

Y1 - 2018/10

N2 - Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.

AB - Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfeld paradigm of creating shared test collections, defning search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not ft this paradigm very well: extremely large data sets, confdential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the feld of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.

KW - Benchmarking

KW - Evaluation-as-a-service

KW - Information access systems

UR - http://www.scopus.com/inward/record.url?scp=85056446482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056446482&partnerID=8YFLogxK

U2 - 10.1145/3239570

DO - 10.1145/3239570

M3 - Review article

AN - SCOPUS:85056446482

SN - 1936-1955

VL - 10

JO - Journal of Data and Information Quality

JF - Journal of Data and Information Quality

IS - 4

M1 - a15

ER -

Evaluation-as-a-service for the computational sciences: Overview and outlook

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this