An effective general purpose approach for automated biomedical document classification.

Aaron M. Cohen

An effective general purpose approach for automated biomedical document classification.

Aaron M. Cohen

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

39 Scopus citations

Abstract

Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

Original language	English (US)
Pages (from-to)	161-165
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
State	Published - 2006

ASJC Scopus subject areas

General Medicine

Cite this

@article{49d113fbb89741fcb854a503bed17498,

title = "An effective general purpose approach for automated biomedical document classification.",

abstract = "Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.",

author = "Cohen, {Aaron M.}",

year = "2006",

language = "English (US)",

pages = "161--165",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",

issn = "1559-4076",

publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - An effective general purpose approach for automated biomedical document classification.

AU - Cohen, Aaron M.

PY - 2006

Y1 - 2006

N2 - Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

AB - Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

UR - http://www.scopus.com/inward/record.url?scp=34748876540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748876540&partnerID=8YFLogxK

M3 - Article

C2 - 17238323

AN - SCOPUS:34748876540

SN - 1559-4076

SP - 161

EP - 165

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

ER -

An effective general purpose approach for automated biomedical document classification.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this