Exploring the role of the modulation spectrum in phoneme recognition

Frederick Gallun, Pamela Souza

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

OBJECTIVES: The ability of human listeners to identify consonants (presented as nonsense syllables) on the basis of primarily temporal information was compared with the predictions of a simple model based on the amplitude modulation spectra of the stimuli calculated for six octave-spaced carrier frequencies (250 to 8000 Hz) and six octave-spaced amplitude modulation frequencies (1 to 32 Hz). DESIGN: The listeners and the model were presented with 16 phonemes each spoken by four different talkers processed so that one, two, four, or eight bands of spectral information remained. The average modulation spectrum of each of the processed phonemes was extracted and similarity across phonemes was calculated by the use of a spectral correlation index (SCI). RESULTS: The similarity of the modulation spectra across phonemes as assessed by the spectral correlation index was a strong predictor of the confusions made by human listeners. CONCLUSIONS: This result suggests that a sparse set of time-averaged patterns of modulation energy can capture a meaningful aspect of the information listeners use to distinguish among speech signals.

Original languageEnglish (US)
Pages (from-to)800-813
Number of pages14
JournalEar and Hearing
Volume29
Issue number5
DOIs
StatePublished - Oct 2008
Externally publishedYes

Fingerprint

Aptitude

ASJC Scopus subject areas

  • Otorhinolaryngology
  • Speech and Hearing

Cite this

Exploring the role of the modulation spectrum in phoneme recognition. / Gallun, Frederick; Souza, Pamela.

In: Ear and Hearing, Vol. 29, No. 5, 10.2008, p. 800-813.

Research output: Contribution to journalArticle

@article{7ae354ffb117433f82c9daa6bb7aa2e8,
title = "Exploring the role of the modulation spectrum in phoneme recognition",
abstract = "OBJECTIVES: The ability of human listeners to identify consonants (presented as nonsense syllables) on the basis of primarily temporal information was compared with the predictions of a simple model based on the amplitude modulation spectra of the stimuli calculated for six octave-spaced carrier frequencies (250 to 8000 Hz) and six octave-spaced amplitude modulation frequencies (1 to 32 Hz). DESIGN: The listeners and the model were presented with 16 phonemes each spoken by four different talkers processed so that one, two, four, or eight bands of spectral information remained. The average modulation spectrum of each of the processed phonemes was extracted and similarity across phonemes was calculated by the use of a spectral correlation index (SCI). RESULTS: The similarity of the modulation spectra across phonemes as assessed by the spectral correlation index was a strong predictor of the confusions made by human listeners. CONCLUSIONS: This result suggests that a sparse set of time-averaged patterns of modulation energy can capture a meaningful aspect of the information listeners use to distinguish among speech signals.",
author = "Frederick Gallun and Pamela Souza",
year = "2008",
month = "10",
doi = "10.1097/AUD.0b013e31817e73ef",
language = "English (US)",
volume = "29",
pages = "800--813",
journal = "Ear and Hearing",
issn = "0196-0202",
publisher = "Lippincott Williams and Wilkins",
number = "5",

}

TY - JOUR

T1 - Exploring the role of the modulation spectrum in phoneme recognition

AU - Gallun, Frederick

AU - Souza, Pamela

PY - 2008/10

Y1 - 2008/10

N2 - OBJECTIVES: The ability of human listeners to identify consonants (presented as nonsense syllables) on the basis of primarily temporal information was compared with the predictions of a simple model based on the amplitude modulation spectra of the stimuli calculated for six octave-spaced carrier frequencies (250 to 8000 Hz) and six octave-spaced amplitude modulation frequencies (1 to 32 Hz). DESIGN: The listeners and the model were presented with 16 phonemes each spoken by four different talkers processed so that one, two, four, or eight bands of spectral information remained. The average modulation spectrum of each of the processed phonemes was extracted and similarity across phonemes was calculated by the use of a spectral correlation index (SCI). RESULTS: The similarity of the modulation spectra across phonemes as assessed by the spectral correlation index was a strong predictor of the confusions made by human listeners. CONCLUSIONS: This result suggests that a sparse set of time-averaged patterns of modulation energy can capture a meaningful aspect of the information listeners use to distinguish among speech signals.

AB - OBJECTIVES: The ability of human listeners to identify consonants (presented as nonsense syllables) on the basis of primarily temporal information was compared with the predictions of a simple model based on the amplitude modulation spectra of the stimuli calculated for six octave-spaced carrier frequencies (250 to 8000 Hz) and six octave-spaced amplitude modulation frequencies (1 to 32 Hz). DESIGN: The listeners and the model were presented with 16 phonemes each spoken by four different talkers processed so that one, two, four, or eight bands of spectral information remained. The average modulation spectrum of each of the processed phonemes was extracted and similarity across phonemes was calculated by the use of a spectral correlation index (SCI). RESULTS: The similarity of the modulation spectra across phonemes as assessed by the spectral correlation index was a strong predictor of the confusions made by human listeners. CONCLUSIONS: This result suggests that a sparse set of time-averaged patterns of modulation energy can capture a meaningful aspect of the information listeners use to distinguish among speech signals.

UR - http://www.scopus.com/inward/record.url?scp=54949092576&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54949092576&partnerID=8YFLogxK

U2 - 10.1097/AUD.0b013e31817e73ef

DO - 10.1097/AUD.0b013e31817e73ef

M3 - Article

C2 - 18596640

AN - SCOPUS:54949092576

VL - 29

SP - 800

EP - 813

JO - Ear and Hearing

JF - Ear and Hearing

SN - 0196-0202

IS - 5

ER -