Class-dependent score combination for speaker recognition

Luciana Ferrer, Kemal Sönmez, Sachin Kajarekar

Research output: Contribution to conferencePaper

6 Scopus citations

Abstract

Many recent performance improvements in speaker recognition using higher-level features, as demonstrated in the NIST Speaker Recognition Evaluation (SRE) task, rely on combinations of multiple systems modeling a large variety of features. The diversity of the large set of features starting from short-term acoustic spectrum features all the way to habitual word usage from a large set of speakers in a multitude of settings (acoustic environment, speaking style, quantities of enrollment/test data) results in a challenging model combination task. In this work, we are presenting a class-dependent score combination technique that relies on clustering of both the target models and the test utterances in a vector space defined by a set of speaker-specific transformation parameters estimated during transcription of the talker's speech by automatic speech recognition (ASR). We show that significant performance gains are obtained by using the first few principal components of a model transform for clustering the speaker verification trials into classes for (target speaker, test utterance) pairs, and then training a separate combiner for each class. We report results on the NIST SRE 2004 and FISHER datasets.

Original languageEnglish (US)
Pages2173-2176
Number of pages4
StatePublished - Dec 1 2005
Externally publishedYes
Event9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: Sep 4 2005Sep 8 2005

Other

Other9th European Conference on Speech Communication and Technology
CountryPortugal
CityLisbon
Period9/4/059/8/05

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Class-dependent score combination for speaker recognition'. Together they form a unique fingerprint.

  • Cite this

    Ferrer, L., Sönmez, K., & Kajarekar, S. (2005). Class-dependent score combination for speaker recognition. 2173-2176. Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.