TY - JOUR
T1 - An overview of the BioCreative 2012 Workshop Track III
T2 - Interactive text mining task
AU - Arighi, Cecilia N.
AU - Carterette, Ben
AU - Cohen, K. Bretonnel
AU - Krallinger, Martin
AU - Wilbur, W. John
AU - Fey, Petra
AU - Dodson, Robert
AU - Cooper, Laurel
AU - Van Slyke, Ceri E.
AU - Dahdul, Wasila
AU - Mabee, Paula
AU - Li, Donghui
AU - Harris, Bethany
AU - Gillespie, Marc
AU - Jimenez, Silvia
AU - Roberts, Phoebe
AU - Matthews, Lisa
AU - Becker, Kevin
AU - Drabkin, Harold
AU - Bello, Susan
AU - Licata, Luana
AU - Chatr-aryamontri, Andrew
AU - Schaeffer, Mary L.
AU - Park, Julie
AU - Haendel, Melissa
AU - Van Auken, Kimberly
AU - Li, Yuling
AU - Chan, Juancarlos
AU - Muller, Hans Michael
AU - Cui, Hong
AU - Balhoff, James P.
AU - Wu, Johnny Chi Yang
AU - Lu, Zhiyong
AU - Wei, Chih Hsuan
AU - Tudor, Catalina O.
AU - Raja, Kalpana
AU - Subramani, Suresh
AU - Natarajan, Jeyakumar
AU - Cejuela, Juan Miguel
AU - Dubey, Pratibha
AU - Wu, Cathy
PY - 2013
Y1 - 2013
N2 - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
AB - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
UR - http://www.scopus.com/inward/record.url?scp=84879330505&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84879330505&partnerID=8YFLogxK
U2 - 10.1093/database/bas056
DO - 10.1093/database/bas056
M3 - Article
C2 - 23327936
AN - SCOPUS:84879330505
SN - 1758-0463
VL - 2013
JO - Database
JF - Database
M1 - bas056
ER -