On the relative importance of various components of the modulation spectrum for automatic speech recognition

Noboru Kanedera; Takayuki Arai; Hynek Hermansky; Misha Pavel

doi:10.1016/S0167-6393(99)00002-3

On the relative importance of various components of the modulation spectrum for automatic speech recognition

Noboru Kanedera, Takayuki Arai, Hynek Hermansky, Misha Pavel

Biomedical Engineering

Research output: Contribution to journal › Article › peer-review

109 Scopus citations

Abstract

We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

Original language	English (US)
Pages (from-to)	43-55
Number of pages	13
Journal	Speech Communication
Volume	28
Issue number	1
DOIs	https://doi.org/10.1016/S0167-6393(99)00002-3
State	Published - May 1999

ASJC Scopus subject areas

Software
Modeling and Simulation
Communication
Language and Linguistics
Linguistics and Language
Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1016/S0167-6393(99)00002-3

Cite this

@article{b3056bf3130849fcab0d91e8c41e8393,

title = "On the relative importance of various components of the modulation spectrum for automatic speech recognition",

abstract = "We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.",

author = "Noboru Kanedera and Takayuki Arai and Hynek Hermansky and Misha Pavel",

year = "1999",

month = may,

doi = "10.1016/S0167-6393(99)00002-3",

language = "English (US)",

volume = "28",

pages = "43--55",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier",

number = "1",

}

TY - JOUR

T1 - On the relative importance of various components of the modulation spectrum for automatic speech recognition

AU - Kanedera, Noboru

AU - Arai, Takayuki

AU - Hermansky, Hynek

AU - Pavel, Misha

PY - 1999/5

Y1 - 1999/5

N2 - We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

AB - We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy.

UR - http://www.scopus.com/inward/record.url?scp=0032676337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032676337&partnerID=8YFLogxK

U2 - 10.1016/S0167-6393(99)00002-3

DO - 10.1016/S0167-6393(99)00002-3

M3 - Article

AN - SCOPUS:0032676337

SN - 0167-6393

VL - 28

SP - 43

EP - 55

JO - Speech Communication

JF - Speech Communication

IS - 1

ER -

On the relative importance of various components of the modulation spectrum for automatic speech recognition

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this