TY - GEN
T1 - SRI'S 2004 NIST speaker recognition evaluation system
AU - Kajarekar, Sachin S.
AU - Ferrer, Luciana
AU - Shriberg, Elizabeth
AU - Sonmez, Kemal
AU - Stolcke, Andreas
AU - Venkataraman, Anand
AU - Zheng, Jing
PY - 2005
Y1 - 2005
N2 - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.
AB - This paper describes our recent efforts in exploring longer-range features and their statistical modeling techniques for speaker recognition. In particular, we describe a system that uses discriminant features from cepstral coefficients, and systems that use discriminant models from word n-grams and syllable-based NERF n-grams. These systems together with a cepstral baseline system are evaluated on the 2004 NIST speaker recognition evaluation dataset. The effect of the development set is measured using two different datasets, one from Switchboard databases and another from the FISHER database. Results show that the difference between the development and evaluation sets affects the performance of the systems only when more training data is available. Results also show that systems using longer-range features combined with the baseline result in about a 31% improvement with 1-side training over the baseline system and about a 61% improvement with 8-side training over the baseline system.
UR - http://www.scopus.com/inward/record.url?scp=21844434603&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=21844434603&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415078
DO - 10.1109/ICASSP.2005.1415078
M3 - Conference contribution
AN - SCOPUS:21844434603
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 173
EP - 176
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -