MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION

Kemal Sönmez; Elizabeth Shriberg; Larry Heck; Mitchel Weintraub

MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION

Kemal Sönmez, Elizabeth Shriberg, Larry Heck, Mitchel Weintraub

Research output: Contribution to conference › Paper › peer-review

Abstract

Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f₀ values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker's f₀ movements by fitting a piecewise linear model to the f₀ track to obtain a stylized f₀ contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.

Original language	English (US)
State	Published - 1998
Externally published	Yes
Event	5th International Conference on Spoken Language Processing, ICSLP 1998 - Sydney, Australia Duration: Nov 30 1998 → Dec 4 1998

Conference

Conference	5th International Conference on Spoken Language Processing, ICSLP 1998
Country/Territory	Australia
City	Sydney
Period	11/30/98 → 12/4/98

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language

Cite this

@conference{3f66f712602a49098075ceb6c0aab832,

title = "MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION",

abstract = "Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker's f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.",

author = "Kemal S{\"o}nmez and Elizabeth Shriberg and Larry Heck and Mitchel Weintraub",

note = "Publisher Copyright: {\textcopyright} 1998. 5th International Conference on Spoken Language Processing, ICSLP 1998. All rights reserved.; 5th International Conference on Spoken Language Processing, ICSLP 1998 ; Conference date: 30-11-1998 Through 04-12-1998",

year = "1998",

language = "English (US)",

}

TY - CONF

T1 - MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION

AU - Sönmez, Kemal

AU - Shriberg, Elizabeth

AU - Heck, Larry

AU - Weintraub, Mitchel

PY - 1998

Y1 - 1998

N2 - Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker's f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.

AB - Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker's distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual's speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker's f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.

UR - http://www.scopus.com/inward/record.url?scp=85128436986&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85128436986&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85128436986

T2 - 5th International Conference on Spoken Language Processing, ICSLP 1998

Y2 - 30 November 1998 through 4 December 1998

ER -

MODELING DYNAMIC PROSODIC VARIATION FOR SPEAKER VERIFICATION

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this