Finna: A paragraph prioritization system for biocuration in the neurosciences

Kyle H. Ambert, Aaron M. Cohen, Gully A.P.C. Burns, Eilis Boudreau, Kemal Sonmez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.

Original languageEnglish (US)
Title of host publicationDiscovery Informatics
Subtitle of host publicationAI Takes a Science-Centered View on Big Data - Papers from the AAAI Fall Symposium, Technical Report
PublisherAI Access Foundation
Pages2-7
Number of pages6
ISBN (Print)9781577356394
StatePublished - 2013
Event2013 AAAI Fall Symposium - Arlington, VA, United States
Duration: Nov 15 2013Nov 17 2013

Publication series

NameAAAI Fall Symposium - Technical Report
VolumeFS-13-01

Other

Other2013 AAAI Fall Symposium
Country/TerritoryUnited States
CityArlington, VA
Period11/15/1311/17/13

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Finna: A paragraph prioritization system for biocuration in the neurosciences'. Together they form a unique fingerprint.

Cite this