POS Tags and Decision Trees for Language Modeling

Peter A. Heeman

Research output: Contribution to conferencePaperpeer-review

16 Scopus citations

Abstract

Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.

Original languageEnglish (US)
Pages129-137
Number of pages9
StatePublished - 1999
Event1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP 1999 - College Park, United States
Duration: Jun 21 1999Jun 22 1999

Conference

Conference1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, EMNLP 1999
Country/TerritoryUnited States
CityCollege Park
Period6/21/996/22/99

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'POS Tags and Decision Trees for Language Modeling'. Together they form a unique fingerprint.

Cite this