Multirate ASR models for phone-class dependent N-best list rescoring

Venkata R. Gadde, Kemal Sönmez, Horacio Franco

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Speech comprises a variety of acoustical phenomena occurring at differing rates. Fixed-rate ASR systems assume in effect a constant temporal rate of information flow via incorporating uniform statistics in proportion to a sound's duration. The usual tradeoff window length of 25-30 milliseconds represents a time-frequency resolution compromise, which aims to allow reasonable speed for following changes in the spectral trajectories and sufficient number of samples to estimate the harmonic structure. In this work, we describe a technique to augment a recognizer that uses this compromise with information from multiple-rate spectral models that emphasize either better time or better frequency resolution in order to improve performance. The main idea is to use the hypotheses generated by a fixed-rate recognizer to determine the appropriate model rate for a segment of the speech waveform. This is realized through a technique based on rescoring of N-best lists with acoustical models using different temporal windows by a phone-dependent posterior-like score. We report results on the NIST Evaluation 2002 dataset, and demonstrate that the rescoring method produces word error rate (WER) improvements in a baseline system.

Original languageEnglish (US)
Title of host publicationProceedings of ASRU 2005
Subtitle of host publication2005 IEEE Automatic Speech Recognition and Understanding Workshop
Pages265-269
Number of pages5
DOIs
StatePublished - Dec 1 2005
Externally publishedYes
EventASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop - Cancun, Mexico
Duration: Nov 27 2005Dec 1 2005

Publication series

NameProceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Volume2005

Other

OtherASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
CountryMexico
CityCancun
Period11/27/0512/1/05

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Multirate ASR models for phone-class dependent N-best list rescoring'. Together they form a unique fingerprint.

  • Cite this

    Gadde, V. R., Sönmez, K., & Franco, H. (2005). Multirate ASR models for phone-class dependent N-best list rescoring. In Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop (pp. 265-269). [1566513] (Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop; Vol. 2005). https://doi.org/10.1109/ASRU.2005.1566513