Multiple speaker tracking and detection: Handset normalization and duration scoring

Mustafa (Kemal) Sonmez, Larry Heck, Mitchel Weintraub

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.

Original languageEnglish (US)
Pages (from-to)133-142
Number of pages10
JournalDigital Signal Processing: A Review Journal
Volume10
Issue number1
DOIs
StatePublished - Jan 2000
Externally publishedYes

Fingerprint

Hidden Markov models
Interpolation
Acoustics

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Multiple speaker tracking and detection : Handset normalization and duration scoring. / Sonmez, Mustafa (Kemal); Heck, Larry; Weintraub, Mitchel.

In: Digital Signal Processing: A Review Journal, Vol. 10, No. 1, 01.2000, p. 133-142.

Research output: Contribution to journalArticle

Sonmez, Mustafa (Kemal) ; Heck, Larry ; Weintraub, Mitchel. / Multiple speaker tracking and detection : Handset normalization and duration scoring. In: Digital Signal Processing: A Review Journal. 2000 ; Vol. 10, No. 1. pp. 133-142.
@article{6a314cfcd64e485fbe504f0d53e132e7,
title = "Multiple speaker tracking and detection: Handset normalization and duration scoring",
abstract = "We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.",
author = "Sonmez, {Mustafa (Kemal)} and Larry Heck and Mitchel Weintraub",
year = "2000",
month = "1",
doi = "10.1006/dspr.1999.0368",
language = "English (US)",
volume = "10",
pages = "133--142",
journal = "Digital Signal Processing: A Review Journal",
issn = "1051-2004",
publisher = "Elsevier Inc.",
number = "1",

}

TY - JOUR

T1 - Multiple speaker tracking and detection

T2 - Handset normalization and duration scoring

AU - Sonmez, Mustafa (Kemal)

AU - Heck, Larry

AU - Weintraub, Mitchel

PY - 2000/1

Y1 - 2000/1

N2 - We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.

AB - We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.

UR - http://www.scopus.com/inward/record.url?scp=0343602867&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0343602867&partnerID=8YFLogxK

U2 - 10.1006/dspr.1999.0368

DO - 10.1006/dspr.1999.0368

M3 - Article

AN - SCOPUS:0343602867

VL - 10

SP - 133

EP - 142

JO - Digital Signal Processing: A Review Journal

JF - Digital Signal Processing: A Review Journal

SN - 1051-2004

IS - 1

ER -