Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update

Aaron M. Cohen; Kyle Ambert; Marian McDonagh

doi:10.1197/jamia.M3162

Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update

Aaron M. Cohen, Kyle Ambert, Marian McDonagh

Medical Informatics and Clinical Epidemiology

Research output: Contribution to journal › Article › peer-review

58 Scopus citations

Abstract

Objective: Machine learning systems can be an aid to experts performing systematic reviews (SRs) by automatically ranking journal articles for work-prioritization. This work investigates whether a topic-specific automated document ranking system for SRs can be improved using a hybrid approach, combining topic-specific training data with data from other SR topics. Design: A test collection was built using annotated reference files from 24 systematic drug class reviews. A support vector machine learning algorithm was evaluated with cross-validation, using seven different fractions of topic-specific training data in combination with samples from the other 23 topics. This approach was compared to both a baseline system, which used only topic-specific training data, and to a system using only the nontopic data sampled from the remaining topics. Measurements: Mean area under the receiver-operating curve (AUC) was used as the measure of comparison. Results: On average, the hybrid system improved mean AUC over the baseline system by 20%, when topic-specific training data were scarce. The system performed significantly better than the baseline system at all levels of topic-specific training data. In addition, the system performed better than the nontopic system at all but the two smallest fractions of topic specific training data, and no worse than the nontopic system with these smallest amounts of topic specific training data. Conclusions: Automated literature prioritization could be helpful in assisting experts to organize their time when performing systematic reviews. Future work will focus on extending the algorithm to use additional sources of topic-specific data, and on embedding the algorithm in an interactive system available to systematic reviewers during the literature review process.

Original language	English (US)
Pages (from-to)	690-704
Number of pages	15
Journal	Journal of the American Medical Informatics Association
Volume	16
Issue number	5
DOIs	https://doi.org/10.1197/jamia.M3162
State	Published - Sep 2009

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1197/jamia.M3162

Cite this

@article{6220116317d34f23983d21f1239245c4,

title = "Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update",

abstract = "Objective: Machine learning systems can be an aid to experts performing systematic reviews (SRs) by automatically ranking journal articles for work-prioritization. This work investigates whether a topic-specific automated document ranking system for SRs can be improved using a hybrid approach, combining topic-specific training data with data from other SR topics. Design: A test collection was built using annotated reference files from 24 systematic drug class reviews. A support vector machine learning algorithm was evaluated with cross-validation, using seven different fractions of topic-specific training data in combination with samples from the other 23 topics. This approach was compared to both a baseline system, which used only topic-specific training data, and to a system using only the nontopic data sampled from the remaining topics. Measurements: Mean area under the receiver-operating curve (AUC) was used as the measure of comparison. Results: On average, the hybrid system improved mean AUC over the baseline system by 20%, when topic-specific training data were scarce. The system performed significantly better than the baseline system at all levels of topic-specific training data. In addition, the system performed better than the nontopic system at all but the two smallest fractions of topic specific training data, and no worse than the nontopic system with these smallest amounts of topic specific training data. Conclusions: Automated literature prioritization could be helpful in assisting experts to organize their time when performing systematic reviews. Future work will focus on extending the algorithm to use additional sources of topic-specific data, and on embedding the algorithm in an interactive system available to systematic reviewers during the literature review process.",

author = "Cohen, {Aaron M.} and Kyle Ambert and Marian McDonagh",

note = "Funding Information: This work was supported by grant 1R01LM009501-01 from the National Library of Medicine. ",

year = "2009",

month = sep,

doi = "10.1197/jamia.M3162",

language = "English (US)",

volume = "16",

pages = "690--704",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "5",

}

TY - JOUR

T1 - Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update

AU - Cohen, Aaron M.

AU - Ambert, Kyle

AU - McDonagh, Marian

N1 - Funding Information: This work was supported by grant 1R01LM009501-01 from the National Library of Medicine.

PY - 2009/9

Y1 - 2009/9

N2 - Objective: Machine learning systems can be an aid to experts performing systematic reviews (SRs) by automatically ranking journal articles for work-prioritization. This work investigates whether a topic-specific automated document ranking system for SRs can be improved using a hybrid approach, combining topic-specific training data with data from other SR topics. Design: A test collection was built using annotated reference files from 24 systematic drug class reviews. A support vector machine learning algorithm was evaluated with cross-validation, using seven different fractions of topic-specific training data in combination with samples from the other 23 topics. This approach was compared to both a baseline system, which used only topic-specific training data, and to a system using only the nontopic data sampled from the remaining topics. Measurements: Mean area under the receiver-operating curve (AUC) was used as the measure of comparison. Results: On average, the hybrid system improved mean AUC over the baseline system by 20%, when topic-specific training data were scarce. The system performed significantly better than the baseline system at all levels of topic-specific training data. In addition, the system performed better than the nontopic system at all but the two smallest fractions of topic specific training data, and no worse than the nontopic system with these smallest amounts of topic specific training data. Conclusions: Automated literature prioritization could be helpful in assisting experts to organize their time when performing systematic reviews. Future work will focus on extending the algorithm to use additional sources of topic-specific data, and on embedding the algorithm in an interactive system available to systematic reviewers during the literature review process.

AB - Objective: Machine learning systems can be an aid to experts performing systematic reviews (SRs) by automatically ranking journal articles for work-prioritization. This work investigates whether a topic-specific automated document ranking system for SRs can be improved using a hybrid approach, combining topic-specific training data with data from other SR topics. Design: A test collection was built using annotated reference files from 24 systematic drug class reviews. A support vector machine learning algorithm was evaluated with cross-validation, using seven different fractions of topic-specific training data in combination with samples from the other 23 topics. This approach was compared to both a baseline system, which used only topic-specific training data, and to a system using only the nontopic data sampled from the remaining topics. Measurements: Mean area under the receiver-operating curve (AUC) was used as the measure of comparison. Results: On average, the hybrid system improved mean AUC over the baseline system by 20%, when topic-specific training data were scarce. The system performed significantly better than the baseline system at all levels of topic-specific training data. In addition, the system performed better than the nontopic system at all but the two smallest fractions of topic specific training data, and no worse than the nontopic system with these smallest amounts of topic specific training data. Conclusions: Automated literature prioritization could be helpful in assisting experts to organize their time when performing systematic reviews. Future work will focus on extending the algorithm to use additional sources of topic-specific data, and on embedding the algorithm in an interactive system available to systematic reviewers during the literature review process.

UR - http://www.scopus.com/inward/record.url?scp=69549124138&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69549124138&partnerID=8YFLogxK

U2 - 10.1197/jamia.M3162

DO - 10.1197/jamia.M3162

M3 - Article

C2 - 19567792

AN - SCOPUS:69549124138

SN - 1067-5027

VL - 16

SP - 690

EP - 704

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 5

ER -

Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this