An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task

Cecilia N. Arighi; Ben Carterette; K. Bretonnel Cohen; Martin Krallinger; W. John Wilbur; Petra Fey; Robert Dodson; Laurel Cooper; Ceri E. Van Slyke; Wasila Dahdul; Paula Mabee; Donghui Li; Bethany Harris; Marc Gillespie; Silvia Jimenez; Phoebe Roberts; Lisa Matthews; Kevin Becker; Harold Drabkin; Susan Bello; Luana Licata; Andrew Chatr-aryamontri; Mary L. Schaeffer; Julie Park; Melissa Haendel; Kimberly Van Auken; Yuling Li; Juancarlos Chan; Hans Michael Muller; Hong Cui; James P. Balhoff; Johnny Chi Yang Wu; Zhiyong Lu; Chih Hsuan Wei; Catalina O. Tudor; Kalpana Raja; Suresh Subramani; Jeyakumar Natarajan; Juan Miguel Cejuela; Pratibha Dubey; Cathy Wu

doi:10.1093/database/bas056

An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task

Cecilia N. Arighi, Ben Carterette, K. Bretonnel Cohen, Martin Krallinger, W. John Wilbur, Petra Fey, Robert Dodson, Laurel Cooper, Ceri E. Van Slyke, Wasila Dahdul, Paula Mabee, Donghui Li, Bethany Harris, Marc Gillespie, Silvia Jimenez, Phoebe Roberts, Lisa Matthews, Kevin Becker, Harold Drabkin, Susan BelloLuana Licata, Andrew Chatr-aryamontri, Mary L. Schaeffer, Julie Park, Melissa Haendel, Kimberly Van Auken, Yuling Li, Juancarlos Chan, Hans Michael Muller, Hong Cui, James P. Balhoff, Johnny Chi Yang Wu, Zhiyong Lu, Chih Hsuan Wei, Catalina O. Tudor, Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan, Juan Miguel Cejuela, Pratibha Dubey, Cathy Wu

OHSU Library

Research output: Contribution to journal › Article › peer-review

54 Scopus citations

Abstract

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

Original language	English (US)
Article number	bas056
Journal	Database
Volume	2013
DOIs	https://doi.org/10.1093/database/bas056
State	Published - 2013

ASJC Scopus subject areas

Information Systems
General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences

Access to Document

10.1093/database/bas056

Cite this

Arighi, C. N., Carterette, B., Cohen, K. B., Krallinger, M., Wilbur, W. J., Fey, P., Dodson, R., Cooper, L., Van Slyke, C. E., Dahdul, W., Mabee, P., Li, D., Harris, B., Gillespie, M., Jimenez, S., Roberts, P., Matthews, L., Becker, K., Drabkin, H., ... Wu, C. (2013). An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task. Database, 2013, Article bas056. https://doi.org/10.1093/database/bas056

Arighi, CN, Carterette, B, Cohen, KB, Krallinger, M, Wilbur, WJ, Fey, P, Dodson, R, Cooper, L, Van Slyke, CE, Dahdul, W, Mabee, P, Li, D, Harris, B, Gillespie, M, Jimenez, S, Roberts, P, Matthews, L, Becker, K, Drabkin, H, Bello, S, Licata, L, Chatr-aryamontri, A, Schaeffer, ML, Park, J, Haendel, M, Van Auken, K, Li, Y, Chan, J, Muller, HM, Cui, H, Balhoff, JP, Wu, JCY, Lu, Z, Wei, CH, Tudor, CO, Raja, K, Subramani, S, Natarajan, J, Cejuela, JM, Dubey, P & Wu, C 2013, 'An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task', Database, vol. 2013, bas056. https://doi.org/10.1093/database/bas056

@article{8cc7e3db5a1844c2859d45195490b073,

title = "An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task",

abstract = "In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.",

author = "Arighi, {Cecilia N.} and Ben Carterette and Cohen, {K. Bretonnel} and Martin Krallinger and Wilbur, {W. John} and Petra Fey and Robert Dodson and Laurel Cooper and {Van Slyke}, {Ceri E.} and Wasila Dahdul and Paula Mabee and Donghui Li and Bethany Harris and Marc Gillespie and Silvia Jimenez and Phoebe Roberts and Lisa Matthews and Kevin Becker and Harold Drabkin and Susan Bello and Luana Licata and Andrew Chatr-aryamontri and Schaeffer, {Mary L.} and Julie Park and Melissa Haendel and {Van Auken}, Kimberly and Yuling Li and Juancarlos Chan and Muller, {Hans Michael} and Hong Cui and Balhoff, {James P.} and Wu, {Johnny Chi Yang} and Zhiyong Lu and Wei, {Chih Hsuan} and Tudor, {Catalina O.} and Kalpana Raja and Suresh Subramani and Jeyakumar Natarajan and Cejuela, {Juan Miguel} and Pratibha Dubey and Cathy Wu",

year = "2013",

doi = "10.1093/database/bas056",

language = "English (US)",

volume = "2013",

journal = "Database",

issn = "1758-0463",

publisher = "Oxford University Press",

}

TY - JOUR

T1 - An overview of the BioCreative 2012 Workshop Track III

T2 - Interactive text mining task

AU - Arighi, Cecilia N.

AU - Carterette, Ben

AU - Cohen, K. Bretonnel

AU - Krallinger, Martin

AU - Wilbur, W. John

AU - Fey, Petra

AU - Dodson, Robert

AU - Cooper, Laurel

AU - Van Slyke, Ceri E.

AU - Dahdul, Wasila

AU - Mabee, Paula

AU - Li, Donghui

AU - Harris, Bethany

AU - Gillespie, Marc

AU - Jimenez, Silvia

AU - Roberts, Phoebe

AU - Matthews, Lisa

AU - Becker, Kevin

AU - Drabkin, Harold

AU - Bello, Susan

AU - Licata, Luana

AU - Chatr-aryamontri, Andrew

AU - Schaeffer, Mary L.

AU - Park, Julie

AU - Haendel, Melissa

AU - Van Auken, Kimberly

AU - Li, Yuling

AU - Chan, Juancarlos

AU - Muller, Hans Michael

AU - Cui, Hong

AU - Balhoff, James P.

AU - Wu, Johnny Chi Yang

AU - Lu, Zhiyong

AU - Wei, Chih Hsuan

AU - Tudor, Catalina O.

AU - Raja, Kalpana

AU - Subramani, Suresh

AU - Natarajan, Jeyakumar

AU - Cejuela, Juan Miguel

AU - Dubey, Pratibha

AU - Wu, Cathy

PY - 2013

Y1 - 2013

N2 - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

AB - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

UR - http://www.scopus.com/inward/record.url?scp=84879330505&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879330505&partnerID=8YFLogxK

U2 - 10.1093/database/bas056

DO - 10.1093/database/bas056

M3 - Article

C2 - 23327936

AN - SCOPUS:84879330505

SN - 1758-0463

VL - 2013

JO - Database

JF - Database

M1 - bas056

ER -

An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this