Modeling duration patterns for speaker recognition

Luciana Ferrer, Harry Bratt, Venkata R.R. Gadde, Sachin Kajarekar, Elizabeth Shriberg, Kemal Sönmez, Andreas Stolcke, Anand Venkataraman

Research output: Contribution to conferencePaper

36 Scopus citations

Abstract

We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtained through adaptation of a "background" model that is trained on a large pool of speakers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when combined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the improvement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test duration at which standard system accuracy level off.

Original languageEnglish (US)
Pages2017-2020
Number of pages4
StatePublished - Jan 1 2003
Externally publishedYes
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period9/1/039/4/03

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Cite this

Ferrer, L., Bratt, H., Gadde, V. R. R., Kajarekar, S., Shriberg, E., Sönmez, K., Stolcke, A., & Venkataraman, A. (2003). Modeling duration patterns for speaker recognition. 2017-2020. Paper presented at 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland.