MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION

Kemal Sönmez, Elizabeth Shriberg, Larry Heck, Mitchel Weintraub

Research output: Contribution to conferencePaperpeer-review

117 Scopus citations

Abstract

Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker's f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.

Original languageEnglish (US)
StatePublished - 1998
Externally publishedYes
Event5th International Conference on Spoken Language Processing, ICSLP 1998 - Sydney, Australia
Duration: Nov 30 1998Dec 4 1998

Conference

Conference5th International Conference on Spoken Language Processing, ICSLP 1998
Country/TerritoryAustralia
CitySydney
Period11/30/9812/4/98

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION'. Together they form a unique fingerprint.

Cite this