Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

A. M. Cohen, W. R. Hersh, C. Dubay, K. Spackman

Research output: Contribution to journalArticle

44 Scopus citations


Background: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. Results: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. Conclusion: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.

Original languageEnglish (US)
Article number103
JournalBMC bioinformatics
StatePublished - Apr 22 2005


ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this