Multiple speaker tracking and detection: Handset normalization and duration scoring

Kemal Sönmez, Larry Heck, Mitchel Weintraub

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

We describe SRI's speaker tracking and detection system in the NIST 1998 Speaker Detection and Tracking Development Evaluation. The system is designed for tracking switchboard conversations and uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset-independent imposter model distribution. Speaker tracking is used to segment waveforms for speaker detection, which is carried out by averaging frame scores of the Viterbi path and normalizing for handset variation via a novel parameter interpolation extension of HNORM for use with waveform segments of arbitrary lengths. A short-duration penalty to augment the acoustic scores is also introduced via a nonlinear combination function. Results on the NIST 1998 Speaker Detection and Tracking Development Evaluation dataset are reported.

Original languageEnglish (US)
Pages (from-to)133-142
Number of pages10
JournalDigital Signal Processing: A Review Journal
Volume10
Issue number1
DOIs
StatePublished - Jan 2000
Externally publishedYes

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Statistics, Probability and Uncertainty
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multiple speaker tracking and detection: Handset normalization and duration scoring'. Together they form a unique fingerprint.

Cite this