TY - GEN
T1 - Consonant discrimination in elicited and spontaneous speech
T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000
AU - Sönmez, Kemal
AU - Plauché, Madelaine
AU - Shriberg, Elizabeth
AU - Franco, Horacio
N1 - Funding Information:
This work was supported by a grant from the National Natural Science Foundation of China (No. 90820303)
PY - 2000
Y1 - 2000
N2 - The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.
AB - The constant frame length in typical ASR front ends is too long to capture transient phenomena in speech, such as stop bursts. However, current HMM systems have consistently outperformed systems based solely on non-uniform units. This work investigates an approach to "add back" such transient information to a speech recognizer, without losing the robustness of the standard acoustic models. We demonstrate a set of phonetically-motivated acoustic features that discriminate a preliminary test set of highly ambiguous voiceless stops in CV contexts. The features are automatically computed from data that had been hand-marked for consonant burst location and voicing onset (extension to automatic marking is also proposed). Two corpora are processed using a parallel set of features: conversational speech over the telephone (Switchboard), and a corpus of carefully elicited speech. The latter provides an upper bound on discrimination, and allows for comparison of feature usage across speaking style. We explore data-driven approaches to obtaining variable-length time-localized features compatible with an HMM statistical framework. We also suggest techniques for extension to automatic annotation of burst location, for computation of features at such points, and for augmentation of an HMM system with the added information.
UR - http://www.scopus.com/inward/record.url?scp=85009115694&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009115694&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009115694
T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000
BT - 6th International Conference on Spoken Language Processing, ICSLP 2000
PB - International Speech Communication Association
Y2 - 16 October 2000 through 20 October 2000
ER -