Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools.

W. R. Hersh, E. H. Campbell, D. A. Evans, N. D. Brownlow

Research output: Contribution to journalArticle

19 Scopus citations


A major impediment to the full benefit of electronic medical records is the lack of a comprehensive clinical vocabulary. Most existing vocabularies do not allow the full expressiveness of clinical diagnoses and findings that are often qualified by modifiers relating to severity, acuity, and temporal factors. One reason for the lack of expressivity is the inability of traditional manual construction techniques to identify the diversity of language used by clinicians. This study used advanced natural language processing tools to identify terminology in a clinical findings domain, compare its coverage with the UMLS Metathesaurus, and quantify the effort required to discover the additional terminology. It was found that substantial amounts of phrases and individual modifiers were not present in the UMLS Metathesaurus and that modest effort in human time and computer processing were needed to obtain the larger quantity of terms.

Original languageEnglish (US)
Pages (from-to)159-163
Number of pages5
JournalProceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium
StatePublished - 1996


ASJC Scopus subject areas

  • Medicine(all)

Cite this