SPEAKER TRACKING AND DETECTION WITH MULTIPLE SPEAKERS

Kemal Sönmez; Larry Heck; Mitchel Weintraub

SPEAKER TRACKING AND DETECTION WITH MULTIPLE SPEAKERS

Kemal Sönmez, Larry Heck, Mitchel Weintraub

Research output: Contribution to conference › Paper › peer-review

Abstract

We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM'ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset.

Original language	English (US)
Pages	2219-2222
Number of pages	4
State	Published - 1999
Externally published	Yes
Event	6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 - Budapest, Hungary Duration: Sep 5 1999 → Sep 9 1999

Conference

Conference	6th European Conference on Speech Communication and Technology, EUROSPEECH 1999
Country/Territory	Hungary
City	Budapest
Period	9/5/99 → 9/9/99

ASJC Scopus subject areas

Computer Science Applications
Software
Linguistics and Language
Communication

Cite this

@conference{1be49747d247472cb198d73a557f46c8,

title = "SPEAKER TRACKING AND DETECTION WITH MULTIPLE SPEAKERS",

abstract = "We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM'ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset.",

author = "Kemal S{\"o}nmez and Larry Heck and Mitchel Weintraub",

note = "Publisher Copyright: {\textcopyright} 1999 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999. All rights reserved.; 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 ; Conference date: 05-09-1999 Through 09-09-1999",

year = "1999",

language = "English (US)",

pages = "2219--2222",

}

TY - CONF

T1 - SPEAKER TRACKING AND DETECTION WITH MULTIPLE SPEAKERS

AU - Sönmez, Kemal

AU - Heck, Larry

AU - Weintraub, Mitchel

PY - 1999

Y1 - 1999

N2 - We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM'ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset.

AB - We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM'ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset.

UR - http://www.scopus.com/inward/record.url?scp=85031608427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85031608427&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85031608427

SP - 2219

EP - 2222

T2 - 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999

Y2 - 5 September 1999 through 9 September 1999

ER -

SPEAKER TRACKING AND DETECTION WITH MULTIPLE SPEAKERS

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this