Diagnosis code assignment: Models and evaluation metrics

Adler Perotte, Rimma Pivovarov, Karthik Natarajan, Nicole Weiskopf, Frank Wood, Noémie Elhadad

Research output: Contribution to journalArticle

48 Citations (Scopus)

Abstract

Background and objective: The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. Methods: We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. Results: The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20 533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Conclusions: Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.

Original languageEnglish (US)
Pages (from-to)231-237
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume21
Issue number2
DOIs
StatePublished - 2014
Externally publishedYes

Fingerprint

Gold
Medical Informatics
Critical Care
Research
Delivery of Health Care

ASJC Scopus subject areas

  • Health Informatics

Cite this

Diagnosis code assignment : Models and evaluation metrics. / Perotte, Adler; Pivovarov, Rimma; Natarajan, Karthik; Weiskopf, Nicole; Wood, Frank; Elhadad, Noémie.

In: Journal of the American Medical Informatics Association, Vol. 21, No. 2, 2014, p. 231-237.

Research output: Contribution to journalArticle

Perotte, Adler ; Pivovarov, Rimma ; Natarajan, Karthik ; Weiskopf, Nicole ; Wood, Frank ; Elhadad, Noémie. / Diagnosis code assignment : Models and evaluation metrics. In: Journal of the American Medical Informatics Association. 2014 ; Vol. 21, No. 2. pp. 231-237.
@article{f0a103c9612e40b1902a6f6de46777d2,
title = "Diagnosis code assignment: Models and evaluation metrics",
abstract = "Background and objective: The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. Methods: We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. Results: The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5{\%} and 27.6{\%}, respectively, when trained on 20 533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Conclusions: Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.",
author = "Adler Perotte and Rimma Pivovarov and Karthik Natarajan and Nicole Weiskopf and Frank Wood and No{\'e}mie Elhadad",
year = "2014",
doi = "10.1136/amiajnl-2013-002159",
language = "English (US)",
volume = "21",
pages = "231--237",
journal = "Journal of the American Medical Informatics Association",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Diagnosis code assignment

T2 - Models and evaluation metrics

AU - Perotte, Adler

AU - Pivovarov, Rimma

AU - Natarajan, Karthik

AU - Weiskopf, Nicole

AU - Wood, Frank

AU - Elhadad, Noémie

PY - 2014

Y1 - 2014

N2 - Background and objective: The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. Methods: We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. Results: The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20 533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Conclusions: Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.

AB - Background and objective: The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. Methods: We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. Results: The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20 533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. Conclusions: Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.

UR - http://www.scopus.com/inward/record.url?scp=84894070857&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894070857&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2013-002159

DO - 10.1136/amiajnl-2013-002159

M3 - Article

C2 - 24296907

AN - SCOPUS:84894070857

VL - 21

SP - 231

EP - 237

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

SN - 1067-5027

IS - 2

ER -