SRI'S 2004 NIST speaker recognition evaluation system

Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg, Mustafa (Kemal) Sonmez, Andreas Stolcke, Anand Venkataraman, Jing Zheng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Citations (Scopus)

Abstract

This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

Original languageEnglish (US)
Title of host publication2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
VolumeI
ISBN (Print)0780388747, 9780780388741
DOIs
StatePublished - 2005
Externally publishedYes
Event2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States
Duration: Mar 18 2005Mar 23 2005

Other

Other2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
CountryUnited States
CityPhiladelphia, PA
Period3/18/053/23/05

Fingerprint

education
evaluation
syllables
coefficients

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Kajarekar, S. S., Ferrer, L., Shriberg, E., Sonmez, M. K., Stolcke, A., Venkataraman, A., & Zheng, J. (2005). SRI'S 2004 NIST speaker recognition evaluation system. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing (Vol. I). [1415078] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2005.1415078

SRI'S 2004 NIST speaker recognition evaluation system. / Kajarekar, Sachin S.; Ferrer, Luciana; Shriberg, Elizabeth; Sonmez, Mustafa (Kemal); Stolcke, Andreas; Venkataraman, Anand; Zheng, Jing.

2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I Institute of Electrical and Electronics Engineers Inc., 2005. 1415078.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kajarekar, SS, Ferrer, L, Shriberg, E, Sonmez, MK, Stolcke, A, Venkataraman, A & Zheng, J 2005, SRI'S 2004 NIST speaker recognition evaluation system. in 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. vol. I, 1415078, Institute of Electrical and Electronics Engineers Inc., 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, PA, United States, 3/18/05. https://doi.org/10.1109/ICASSP.2005.1415078
Kajarekar SS, Ferrer L, Shriberg E, Sonmez MK, Stolcke A, Venkataraman A et al. SRI'S 2004 NIST speaker recognition evaluation system. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I. Institute of Electrical and Electronics Engineers Inc. 2005. 1415078 https://doi.org/10.1109/ICASSP.2005.1415078
Kajarekar, Sachin S. ; Ferrer, Luciana ; Shriberg, Elizabeth ; Sonmez, Mustafa (Kemal) ; Stolcke, Andreas ; Venkataraman, Anand ; Zheng, Jing. / SRI'S 2004 NIST speaker recognition evaluation system. 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Vol. I Institute of Electrical and Electronics Engineers Inc., 2005.
@inproceedings{bb1391951ca2461f821fff8c46761dfb,
title = "SRI'S 2004 NIST speaker recognition evaluation system",
abstract = "This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31{\%} improvement with 1-side training over the baseline system and about a 61{\%} improvement with 8-side training over the baseline system.",
author = "Kajarekar, {Sachin S.} and Luciana Ferrer and Elizabeth Shriberg and Sonmez, {Mustafa (Kemal)} and Andreas Stolcke and Anand Venkataraman and Jing Zheng",
year = "2005",
doi = "10.1109/ICASSP.2005.1415078",
language = "English (US)",
isbn = "0780388747",
volume = "I",
booktitle = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - SRI'S 2004 NIST speaker recognition evaluation system

AU - Kajarekar, Sachin S.

AU - Ferrer, Luciana

AU - Shriberg, Elizabeth

AU - Sonmez, Mustafa (Kemal)

AU - Stolcke, Andreas

AU - Venkataraman, Anand

AU - Zheng, Jing

PY - 2005

Y1 - 2005

N2 - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

AB - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

UR - http://www.scopus.com/inward/record.url?scp=21844434603&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21844434603&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2005.1415078

DO - 10.1109/ICASSP.2005.1415078

M3 - Conference contribution

SN - 0780388747

SN - 9780780388741

VL - I

BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -