A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION

M. Kemal Sönmez; Larry Heck; Mitchel Weintraub; Elizabeth Shriberg

A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION

M. Kemal Sönmez, Larry Heck, Mitchel Weintraub, Elizabeth Shriberg

Research output: Contribution to conference › Paper › peer-review

Abstract

Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and 11 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

Original language	English (US)
Pages	1391-1394
Number of pages	4
State	Published - 1997
Externally published	Yes
Event	5th European Conference on Speech Communication and Technology, EUROSPEECH 1997 - Rhodes, Greece Duration: Sep 22 1997 → Sep 25 1997

Conference

Conference	5th European Conference on Speech Communication and Technology, EUROSPEECH 1997
Country/Territory	Greece
City	Rhodes
Period	9/22/97 → 9/25/97

ASJC Scopus subject areas

Computer Science Applications
Software
Linguistics and Language
Communication

Cite this

@conference{0bc36834483044d7993a69d58d568aea,

title = "A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION",

abstract = "Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that {"}clean{"} pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the {"}one-session{"} condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and 11 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.",

author = "S{\"o}nmez, {M. Kemal} and Larry Heck and Mitchel Weintraub and Elizabeth Shriberg",

note = "Publisher Copyright: {\textcopyright} 1997 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997. All rights reserved.; 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997 ; Conference date: 22-09-1997 Through 25-09-1997",

year = "1997",

language = "English (US)",

pages = "1391--1394",

}

TY - CONF

T1 - A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION

AU - Sönmez, M. Kemal

AU - Heck, Larry

AU - Weintraub, Mitchel

AU - Shriberg, Elizabeth

PY - 1997

Y1 - 1997

N2 - Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and 11 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

AB - Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that "clean" pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the "one-session" condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1 % miss rate and 11 % reduction in false alarm rate at 10% miss rate over the cepstrum-only system.

UR - http://www.scopus.com/inward/record.url?scp=85135139722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85135139722&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85135139722

SP - 1391

EP - 1394

T2 - 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997

Y2 - 22 September 1997 through 25 September 1997

ER -

A LOGNORMAL TIED MIXTURE MODEL OF PITCH FOR PROSODY-BASED SPEAKER RECOGNITION

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this