SRI'S 2004 NIST speaker recognition evaluation system

Sachin S. Kajarekar; Luciana Ferrer; Elizabeth Shriberg; Kemal Sonmez; Andreas Stolcke; Anand Venkataraman; Jing Zheng

doi:10.1109/ICASSP.2005.1415078

SRI'S 2004 NIST speaker recognition evaluation system

Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg, Kemal Sonmez, Andreas Stolcke, Anand Venkataraman, Jing Zheng

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

41 Scopus citations

Abstract

This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

Original language	English (US)
Title of host publication	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	173-176
Number of pages	4
ISBN (Print)	0780388747, 9780780388741
DOIs	https://doi.org/10.1109/ICASSP.2005.1415078
State	Published - 2005
Externally published	Yes
Event	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States Duration: Mar 18 2005 → Mar 23 2005

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	I
ISSN (Print)	1520-6149

Other

Other	2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Country/Territory	United States
City	Philadelphia, PA
Period	3/18/05 → 3/23/05

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2005.1415078

Cite this

Kajarekar, S. S., Ferrer, L., Shriberg, E., Sonmez, K., Stolcke, A., Venkataraman, A., & Zheng, J. (2005). SRI'S 2004 NIST speaker recognition evaluation system. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing (pp. 173-176). Article 1415078 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. I). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2005.1415078

SRI'S 2004 NIST speaker recognition evaluation system. / Kajarekar, Sachin S.; Ferrer, Luciana; Shriberg, Elizabeth et al.
2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc., 2005. p. 173-176 1415078 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. I).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kajarekar, SS, Ferrer, L, Shriberg, E, Sonmez, K, Stolcke, A, Venkataraman, A & Zheng, J 2005, SRI'S 2004 NIST speaker recognition evaluation system. in 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing., 1415078, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. I, Institute of Electrical and Electronics Engineers Inc., pp. 173-176, 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, PA, United States, 3/18/05. https://doi.org/10.1109/ICASSP.2005.1415078

Kajarekar SS, Ferrer L, Shriberg E, Sonmez K, Stolcke A, Venkataraman A et al. SRI'S 2004 NIST speaker recognition evaluation system. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc. 2005. p. 173-176. 1415078. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2005.1415078

Kajarekar, Sachin S. ; Ferrer, Luciana ; Shriberg, Elizabeth et al. / SRI'S 2004 NIST speaker recognition evaluation system. 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing. Institute of Electrical and Electronics Engineers Inc., 2005. pp. 173-176 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{bb1391951ca2461f821fff8c46761dfb,

title = "SRI'S 2004 NIST speaker recognition evaluation system",

abstract = "This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.",

author = "Kajarekar, {Sachin S.} and Luciana Ferrer and Elizabeth Shriberg and Kemal Sonmez and Andreas Stolcke and Anand Venkataraman and Jing Zheng",

year = "2005",

doi = "10.1109/ICASSP.2005.1415078",

language = "English (US)",

isbn = "0780388747",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "173--176",

booktitle = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing",

note = "2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 ; Conference date: 18-03-2005 Through 23-03-2005",

}

TY - GEN

T1 - SRI'S 2004 NIST speaker recognition evaluation system

AU - Kajarekar, Sachin S.

AU - Ferrer, Luciana

AU - Shriberg, Elizabeth

AU - Sonmez, Kemal

AU - Stolcke, Andreas

AU - Venkataraman, Anand

AU - Zheng, Jing

PY - 2005

Y1 - 2005

N2 - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

AB - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.

UR - http://www.scopus.com/inward/record.url?scp=21844434603&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21844434603&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2005.1415078

DO - 10.1109/ICASSP.2005.1415078

M3 - Conference contribution

AN - SCOPUS:21844434603

SN - 0780388747

SN - 9780780388741

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 173

EP - 176

BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05

Y2 - 18 March 2005 through 23 March 2005

ER -

SRI'S 2004 NIST speaker recognition evaluation system

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this