Modeling duration patterns for speaker recognition

Luciana Ferrer; Harry Bratt; Venkata R.R. Gadde; Sachin Kajarekar; Elizabeth Shriberg; Kemal Sönmez; Andreas Stolcke; Anand Venkataraman

Modeling duration patterns for speaker recognition

Luciana Ferrer, Harry Bratt, Venkata R.R. Gadde, Sachin Kajarekar, Elizabeth Shriberg, Kemal Sönmez, Andreas Stolcke, Anand Venkataraman

Research output: Contribution to conference › Paper › peer-review

43 Scopus citations

Abstract

We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtained through adaptation of a "background" model that is trained on a large pool of speakers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when combined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the improvement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test duration at which standard system accuracy level off.

Original language	English (US)
Pages	2017-2020
Number of pages	4
State	Published - 2003
Externally published	Yes
Event	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: Sep 1 2003 → Sep 4 2003

Other

Other	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
Country/Territory	Switzerland
City	Geneva
Period	9/1/03 → 9/4/03

ASJC Scopus subject areas

Computer Science Applications
Software
Linguistics and Language
Communication

Cite this

@conference{ac8e7f517a9648988648e20e3eacb36a,

title = "Modeling duration patterns for speaker recognition",

abstract = "We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtained through adaptation of a {"}background{"} model that is trained on a large pool of speakers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when combined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the improvement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test duration at which standard system accuracy level off.",

author = "Luciana Ferrer and Harry Bratt and Gadde, {Venkata R.R.} and Sachin Kajarekar and Elizabeth Shriberg and Kemal S{\"o}nmez and Andreas Stolcke and Anand Venkataraman",

note = "Funding Information: We thank Doug Reynolds and Gary Kuhn for helpful discussions. This work was funded by a KDD supplement to NSF IRI-9619921 and by NASA Award NCC 2-1256. The views herein are those of the authors and do not reflect the policies of the funding agencies.; 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 ; Conference date: 01-09-2003 Through 04-09-2003",

year = "2003",

language = "English (US)",

pages = "2017--2020",

}

TY - CONF

T1 - Modeling duration patterns for speaker recognition

AU - Ferrer, Luciana

AU - Bratt, Harry

AU - Gadde, Venkata R.R.

AU - Kajarekar, Sachin

AU - Shriberg, Elizabeth

AU - Sönmez, Kemal

AU - Stolcke, Andreas

AU - Venkataraman, Anand

N1 - Funding Information: We thank Doug Reynolds and Gary Kuhn for helpful discussions. This work was funded by a KDD supplement to NSF IRI-9619921 and by NASA Award NCC 2-1256. The views herein are those of the authors and do not reflect the policies of the funding agencies.

PY - 2003

Y1 - 2003

N2 - We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtained through adaptation of a "background" model that is trained on a large pool of speakers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when combined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the improvement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test duration at which standard system accuracy level off.

AB - We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtained through adaptation of a "background" model that is trained on a large pool of speakers. Speaker models are then used to score the test data; they are normalized by subtracting the scores obtained with the background model. We find that this approach yields significant perfomance improvement when combined with a state-of-the-art speaker recognition system based on standard cepstral features. Furthermore, the improvement persists even after combination with lexical features. Finally, the improvement continues to increase with longer test sample durations, beyond the test duration at which standard system accuracy level off.

UR - http://www.scopus.com/inward/record.url?scp=80052127853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052127853&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:80052127853

SP - 2017

EP - 2020

T2 - 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003

Y2 - 1 September 2003 through 4 September 2003

ER -

Modeling duration patterns for speaker recognition

Abstract

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this