Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR

Kemal Sönmez; Madelaine Plauché; Elizabeth Shriberg; Horacio Franco

Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR

Kemal Sönmez, Madelaine Plauché, Elizabeth Shriberg, Horacio Franco

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.

Original language	English (US)
Title of host publication	6th International Conference on Spoken Language Processing, ICSLP 2000
Publisher	International Speech Communication Association
ISBN (Electronic)	7801501144, 9787801501141
State	Published - 2000
Externally published	Yes
Event	6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China Duration: Oct 16 2000 → Oct 20 2000

Publication series

Name	6th International Conference on Spoken Language Processing, ICSLP 2000

Other

Other	6th International Conference on Spoken Language Processing, ICSLP 2000
Country/Territory	China
City	Beijing
Period	10/16/00 → 10/20/00

ASJC Scopus subject areas

Linguistics and Language
Language and Linguistics

Cite this

Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR. / Sönmez, Kemal; Plauché, Madelaine; Shriberg, Elizabeth et al.
6th International Conference on Spoken Language Processing, ICSLP 2000. International Speech Communication Association, 2000. (6th International Conference on Spoken Language Processing, ICSLP 2000).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Sönmez, K, Plauché, M, Shriberg, E & Franco, H 2000, Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR. in 6th International Conference on Spoken Language Processing, ICSLP 2000. 6th International Conference on Spoken Language Processing, ICSLP 2000, International Speech Communication Association, 6th International Conference on Spoken Language Processing, ICSLP 2000, Beijing, China, 10/16/00.

Sönmez, Kemal ; Plauché, Madelaine ; Shriberg, Elizabeth et al. / Consonant discrimination in elicited and spontaneous speech : A case for signal-adaptive front ends in ASR. 6th International Conference on Spoken Language Processing, ICSLP 2000. International Speech Communication Association, 2000. (6th International Conference on Spoken Language Processing, ICSLP 2000).

@inproceedings{235dba2c24b7455280feabd4e41c9e89,

title = "Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR",

abstract = "The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to {"}add back{"} such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.",

author = "Kemal S{\"o}nmez and Madelaine Plauch{\'e} and Elizabeth Shriberg and Horacio Franco",

note = "Funding Information: This work was supported by a grant from the National Natural Science Foundation of China (No. 90820303); 6th International Conference on Spoken Language Processing, ICSLP 2000 ; Conference date: 16-10-2000 Through 20-10-2000",

year = "2000",

language = "English (US)",

series = "6th International Conference on Spoken Language Processing, ICSLP 2000",

publisher = "International Speech Communication Association",

booktitle = "6th International Conference on Spoken Language Processing, ICSLP 2000",

}

TY - GEN

T1 - Consonant discrimination in elicited and spontaneous speech

T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000

AU - Sönmez, Kemal

AU - Plauché, Madelaine

AU - Shriberg, Elizabeth

AU - Franco, Horacio

N1 - Funding Information: This work was supported by a grant from the National Natural Science Foundation of China (No. 90820303)

PY - 2000

Y1 - 2000

N2 - The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.

AB - The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.

UR - http://www.scopus.com/inward/record.url?scp=85009115694&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009115694&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009115694

T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000

BT - 6th International Conference on Spoken Language Processing, ICSLP 2000

PB - International Speech Communication Association

Y2 - 16 October 2000 through 20 October 2000

ER -

Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this