A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION

M. Kemal Sönmez, Larry Heck, Mitchel Weintraub, Elizabeth Shriberg

Research output: Contribution to conferencePaperpeer-review

73 Scopus citations

Abstract

Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and 11 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

Original languageEnglish (US)
Pages1391-1394
Number of pages4
StatePublished - 1997
Externally publishedYes
Event5th European Conference on Speech Communication and Technology, EUROSPEECH 1997 - Rhodes, Greece
Duration: Sep 22 1997Sep 25 1997

Conference

Conference5th European Conference on Speech Communication and Technology, EUROSPEECH 1997
Country/TerritoryGreece
CityRhodes
Period9/22/979/25/97

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION'. Together they form a unique fingerprint.

Cite this