TY - JOUR
T1 - Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task
AU - Chamberlin, Steven R.
AU - Bedrick, Steven D.
AU - Cohen, Aaron M.
AU - Wang, Yanshan
AU - Wen, Andrew
AU - Liu, Sijia
AU - Liu, Hongfang
AU - Hersh, William R.
N1 - Publisher Copyright:
© The Author(s) 2020.
PY - 2020
Y1 - 2020
N2 - Objective: Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. Materials and Methods: We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. Results: The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. Conclusion: While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.
AB - Objective: Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. Materials and Methods: We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. Results: The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. Conclusion: While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.
KW - Electronic health record
KW - Information retrieval
KW - Patient cohort discovery
KW - Structured queries
UR - http://www.scopus.com/inward/record.url?scp=85101337530&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101337530&partnerID=8YFLogxK
U2 - 10.1093/JAMIAOPEN/OOAA026
DO - 10.1093/JAMIAOPEN/OOAA026
M3 - Article
AN - SCOPUS:85101337530
SN - 2574-2531
VL - 3
SP - 395
EP - 404
JO - JAMIA Open
JF - JAMIA Open
IS - 3
ER -