SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

Jianji J. Yang; Aaron M. Cohen; Marian S. McDonagh

SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

Jianji J. Yang, Aaron M. Cohen, Marian S. McDonagh

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

Original language	English (US)
Pages (from-to)	825-829
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
State	Published - 2008

ASJC Scopus subject areas

General Medicine

Cite this

@article{1f8f25b68e3f4210ac29bf40a6d1bc95,

title = "SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.",

abstract = "Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.",

author = "Yang, {Jianji J.} and Cohen, {Aaron M.} and McDonagh, {Marian S.}",

year = "2008",

language = "English (US)",

pages = "825--829",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",

issn = "1559-4076",

publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - SYRIAC

T2 - The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

AU - Yang, Jianji J.

AU - Cohen, Aaron M.

AU - McDonagh, Marian S.

PY - 2008

Y1 - 2008

N2 - Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

AB - Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

UR - http://www.scopus.com/inward/record.url?scp=69549138794&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549138794&partnerID=8YFLogxK

M3 - Article

C2 - 18999194

AN - SCOPUS:69549138794

SN - 1559-4076

SP - 825

EP - 829

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

ER -

SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this