Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

Aaron M. Cohen

Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

Aaron M. Cohen

Medical Informatics and Clinical Epidemiology

Research output: Contribution to conference › Paper › peer-review

48 Scopus citations

Abstract

Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.

Original language	English (US)
Pages	17-24
Number of pages	8
State	Published - 2005
Event	2005 ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, ACL-ISMB 2005 - Detroit, United States Duration: Jun 24 2005 → …

Conference

Conference	2005 ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, ACL-ISMB 2005
Country/Territory	United States
City	Detroit
Period	6/24/05 → …

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
Artificial Intelligence
Information Systems

Cite this

@conference{89ce7ad6a64c4739ba278ddd737974f0,

title = "Unsupervised gene/protein named entity normalization using automatically extracted dictionaries",

abstract = "Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.",

author = "Cohen, {Aaron M.}",

note = "Publisher Copyright: {\textcopyright} 2005 Association for Computational Linguistics; 2005 ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, ACL-ISMB 2005 ; Conference date: 24-06-2005",

year = "2005",

language = "English (US)",

pages = "17--24",

}

TY - CONF

T1 - Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

AU - Cohen, Aaron M.

PY - 2005

Y1 - 2005

N2 - Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.

AB - Gene and protein named-entity recognition (NER) and normalization is often treated as a two-step process. While the first step, NER, has received considerable attention over the last few years, normalization has received much less attention. We have built a dictionary based gene and protein NER and normalization system that requires no supervised training and no human intervention to build the dictionaries from online genomics resources. We have tested our system on the Genia corpus and the BioCreative Task 1B mouse and yeast corpora and achieved a level of performance comparable to state-of-the-art systems that require supervised learning and manual dictionary creation. Our technique should also work for organisms following similar naming conventions as mouse, such as human. Further evaluation and improvement of gene/protein NER and normalization systems is somewhat hampered by the lack of larger test collections and collections for additional organisms, such as human.

UR - http://www.scopus.com/inward/record.url?scp=38949199184&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38949199184&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:38949199184

SP - 17

EP - 24

T2 - 2005 ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, ACL-ISMB 2005

Y2 - 24 June 2005

ER -

Unsupervised gene/protein named entity normalization using automatically extracted dictionaries

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this