Finna: A paragraph prioritization system for biocuration in the neurosciences

Kyle H. Ambert, Aaron Cohen, Gully A P C Burns, Eilis Boudreau, Mustafa (Kemal) Sonmez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.

Original languageEnglish (US)
Title of host publicationAAAI Fall Symposium - Technical Report
PublisherAI Access Foundation
Pages2-7
Number of pages6
VolumeFS-13-01
ISBN (Print)9781577356394
StatePublished - 2013
Event2013 AAAI Fall Symposium - Arlington, VA, United States
Duration: Nov 15 2013Nov 17 2013

Other

Other2013 AAAI Fall Symposium
CountryUnited States
CityArlington, VA
Period11/15/1311/17/13

Fingerprint

Neurons
Classifiers
Composite materials

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Ambert, K. H., Cohen, A., Burns, G. A. P. C., Boudreau, E., & Sonmez, M. K. (2013). Finna: A paragraph prioritization system for biocuration in the neurosciences. In AAAI Fall Symposium - Technical Report (Vol. FS-13-01, pp. 2-7). AI Access Foundation.

Finna : A paragraph prioritization system for biocuration in the neurosciences. / Ambert, Kyle H.; Cohen, Aaron; Burns, Gully A P C; Boudreau, Eilis; Sonmez, Mustafa (Kemal).

AAAI Fall Symposium - Technical Report. Vol. FS-13-01 AI Access Foundation, 2013. p. 2-7.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ambert, KH, Cohen, A, Burns, GAPC, Boudreau, E & Sonmez, MK 2013, Finna: A paragraph prioritization system for biocuration in the neurosciences. in AAAI Fall Symposium - Technical Report. vol. FS-13-01, AI Access Foundation, pp. 2-7, 2013 AAAI Fall Symposium, Arlington, VA, United States, 11/15/13.
Ambert KH, Cohen A, Burns GAPC, Boudreau E, Sonmez MK. Finna: A paragraph prioritization system for biocuration in the neurosciences. In AAAI Fall Symposium - Technical Report. Vol. FS-13-01. AI Access Foundation. 2013. p. 2-7
Ambert, Kyle H. ; Cohen, Aaron ; Burns, Gully A P C ; Boudreau, Eilis ; Sonmez, Mustafa (Kemal). / Finna : A paragraph prioritization system for biocuration in the neurosciences. AAAI Fall Symposium - Technical Report. Vol. FS-13-01 AI Access Foundation, 2013. pp. 2-7
@inproceedings{41114c7967d242719226ca2e3d11322f,
title = "Finna: A paragraph prioritization system for biocuration in the neurosciences",
abstract = "The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.",
author = "Ambert, {Kyle H.} and Aaron Cohen and Burns, {Gully A P C} and Eilis Boudreau and Sonmez, {Mustafa (Kemal)}",
year = "2013",
language = "English (US)",
isbn = "9781577356394",
volume = "FS-13-01",
pages = "2--7",
booktitle = "AAAI Fall Symposium - Technical Report",
publisher = "AI Access Foundation",

}

TY - GEN

T1 - Finna

T2 - A paragraph prioritization system for biocuration in the neurosciences

AU - Ambert, Kyle H.

AU - Cohen, Aaron

AU - Burns, Gully A P C

AU - Boudreau, Eilis

AU - Sonmez, Mustafa (Kemal)

PY - 2013

Y1 - 2013

N2 - The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.

AB - The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general.

UR - http://www.scopus.com/inward/record.url?scp=84898867244&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898867244&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84898867244

SN - 9781577356394

VL - FS-13-01

SP - 2

EP - 7

BT - AAAI Fall Symposium - Technical Report

PB - AI Access Foundation

ER -