TY - GEN
T1 - The contribution of cepstral and stylistic features to SRI'S 2005 NIST speaker recognition evaluation system
AU - Ferrer, Luciana
AU - Shriberg, Elizabeth
AU - Kajarekar, Sachin S.
AU - Stolcke, Andreas
AU - Sönmez, Kemal
AU - Venkataraman, Anand
AU - Bratt, Harry
PY - 2006
Y1 - 2006
N2 - Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.
AB - Recent work in speaker recognition has demonstrated the advantage of modeling stylistic features in addition to traditional cepstral features, but to date there has been little study of the relative contributions of these different feature types to a state-of-the-art system. In this paper we provide such an analysis, based on SRI's submission to the NIST 2005 Speaker Recognition Evaluation. The system consists of 7 subsystems (3 cepstral, 4 stylistic). By running independent N-way subsystem combinations for increasing values of N, we find that (1) a monotonic pattern in the choice of the best N systems allows for the inference of subsystem importance; (2) the ordering of subsystems alternates between cepstral and stylistic; (3) syllable-based prosodic features are the strongest stylistic features, and (4) overall subsystem ordering depends crucially on the amount of training data (1 versus 8 conversation sides). Improvements over the baseline cepstral system, when all systems are combined, range from 47% to 67%, with larger improvements for the 8-side condition. These results provide direct evidence of the complementary contributions of cepstral and stylistic features to speaker discrimination.
UR - http://www.scopus.com/inward/record.url?scp=33947667903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947667903&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33947667903
SN - 142440469X
SN - 9781424404698
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I101-I104
BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
T2 - 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
Y2 - 14 May 2006 through 19 May 2006
ER -