The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system

Luciana Ferrer, Elizabeth Shriberg, Sachin S. Kajarekar, Andreas Stolcke, Mustafa (Kemal) Sonmez, Anand Venkataraman, Harry Bratt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
StatePublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse, France
Duration: May 14 2006May 19 2006

Other

Other2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
CountryFrance
CityToulouse
Period5/14/065/19/06

Fingerprint

conversation
syllables
evaluation
inference
discrimination
education

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Ferrer, L., Shriberg, E., Kajarekar, S. S., Stolcke, A., Sonmez, M. K., Venkataraman, A., & Bratt, H. (2006). The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1). [1659967]

The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system. / Ferrer, Luciana; Shriberg, Elizabeth; Kajarekar, Sachin S.; Stolcke, Andreas; Sonmez, Mustafa (Kemal); Venkataraman, Anand; Bratt, Harry.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 2006. 1659967.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ferrer, L, Shriberg, E, Kajarekar, SS, Stolcke, A, Sonmez, MK, Venkataraman, A & Bratt, H 2006, The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 1, 1659967, 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, Toulouse, France, 5/14/06.
Ferrer L, Shriberg E, Kajarekar SS, Stolcke A, Sonmez MK, Venkataraman A et al. The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1. 2006. 1659967
Ferrer, Luciana ; Shriberg, Elizabeth ; Kajarekar, Sachin S. ; Stolcke, Andreas ; Sonmez, Mustafa (Kemal) ; Venkataraman, Anand ; Bratt, Harry. / The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 2006.
@inproceedings{efb3ab280bfc44b19788462f015b231b,
title = "The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system",
abstract = "Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47{\%} to 67{\%}, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.",
author = "Luciana Ferrer and Elizabeth Shriberg and Kajarekar, {Sachin S.} and Andreas Stolcke and Sonmez, {Mustafa (Kemal)} and Anand Venkataraman and Harry Bratt",
year = "2006",
language = "English (US)",
isbn = "142440469X",
volume = "1",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system

AU - Ferrer, Luciana

AU - Shriberg, Elizabeth

AU - Kajarekar, Sachin S.

AU - Stolcke, Andreas

AU - Sonmez, Mustafa (Kemal)

AU - Venkataraman, Anand

AU - Bratt, Harry

PY - 2006

Y1 - 2006

N2 - Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.

AB - Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.

UR - http://www.scopus.com/inward/record.url?scp=33947667903&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947667903&partnerID=8YFLogxK

M3 - Conference contribution

SN - 142440469X

SN - 9781424404698

VL - 1

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -