Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Larry P. Heck, Yochai Konig, M. Kemal Sönmez, Mitch Weintraub

Research output: Contribution to journalArticlepeer-review

60 Scopus citations

Abstract

A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear artificial neural network. The neural network is discriminatively trained to maximize speaker recognition performance specifically in the setting of telephone handset mismatch between training and testing. The algorithm requires neither stereo recordings of speech during training nor manual labeling of handset types either in training or testing. Results on the 1998 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation corpus show relative improvements as high as 28% for the new multilayered perceptron (MLP)-based features as compared to a standard mel-cepstral feature set with cepstral mean subtraction (CMS) and handset-dependent normalizing impostor models.

Original languageEnglish (US)
Pages (from-to)181-192
Number of pages12
JournalSpeech Communication
Volume31
Issue number2
DOIs
StatePublished - Jun 2000
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Robustness to telephone handset distortion in speaker recognition by discriminative feature design'. Together they form a unique fingerprint.

Cite this