TY - GEN
T1 - Detecting Health Related Discussions in Everyday Telephone Conversations for Studying Medical Events in the Lives of Older Adults
AU - Sheikhshab, Golnar
AU - Shafran, Izhak
AU - Kaye, Jeffrey
N1 - Funding Information:
This research was supported in part by NIH Grants 1K25AG033723, and P30 AG008017, as well as by NSF Grants 1027834, and 0964102. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NIH. We thank Nicole Larimer for help in collecting the data, Maider Lehr for testing the data collection devices and Katherine Wild for early discussions on this project. We are grateful to Brian Kingsbury and his colleagues for providing us access to IBM’s attila software tools.
Publisher Copyright:
©2014 Association for Computational Linguistics
PY - 2014
Y1 - 2014
N2 - We apply semi-supervised topic modeling techniques to detect health-related discussions in everyday telephone conversations, which has applications in large-scale epidemiological studies and for clinical interventions for older adults. The privacy requirements associated with utilizing everyday telephone conversations preclude manual annotations; hence, we explore semi-supervised methods in this task. We adopt a semi-supervised version of Latent Dirichlet Allocation (LDA) to guide the learning process. Within this framework, we investigate a strategy to discard irrelevant words in the topic distribution and demonstrate that this strategy improves the average F-score on the in-domain task and an out-of-domain task (Fisher corpus). Our results show that the increase in discussion of health related conversations is statistically associated with actual medical events obtained through weekly self-reports.
AB - We apply semi-supervised topic modeling techniques to detect health-related discussions in everyday telephone conversations, which has applications in large-scale epidemiological studies and for clinical interventions for older adults. The privacy requirements associated with utilizing everyday telephone conversations preclude manual annotations; hence, we explore semi-supervised methods in this task. We adopt a semi-supervised version of Latent Dirichlet Allocation (LDA) to guide the learning process. Within this framework, we investigate a strategy to discard irrelevant words in the topic distribution and demonstrate that this strategy improves the average F-score on the in-domain task and an out-of-domain task (Fisher corpus). Our results show that the increase in discussion of health related conversations is statistically associated with actual medical events obtained through weekly self-reports.
UR - http://www.scopus.com/inward/record.url?scp=85122512963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122512963&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85122512963
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 38
EP - 44
BT - ACL 2014 - BioNLP 2014, Workshop on Biomedical Natural Language Processing, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - ACL 2014 Workshop on Biomedical Natural Language Processing, BioNLP 2014
Y2 - 27 June 2014 through 28 June 2014
ER -