An effective general purpose approach for automated biomedical document classification.

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

Original languageEnglish (US)
Pages (from-to)161-165
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2006

Fingerprint

Costs and Cost Analysis
Politics
Support Vector Machine

ASJC Scopus subject areas

  • Medicine(all)

Cite this

@article{49d113fbb89741fcb854a503bed17498,
title = "An effective general purpose approach for automated biomedical document classification.",
abstract = "Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.",
author = "Aaron Cohen",
year = "2006",
language = "English (US)",
pages = "161--165",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - An effective general purpose approach for automated biomedical document classification.

AU - Cohen, Aaron

PY - 2006

Y1 - 2006

N2 - Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

AB - Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

UR - http://www.scopus.com/inward/record.url?scp=34748876540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748876540&partnerID=8YFLogxK

M3 - Article

C2 - 17238323

AN - SCOPUS:34748876540

SP - 161

EP - 165

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -