TY - GEN
T1 - Speaker recognition using prosodic and lexical features
AU - Kajarekar, Sachin
AU - Ferrer, Luciana
AU - Venkataraman, Anand
AU - Sonmez, Kemal
AU - Shriberg, Elizabeth
AU - Stolcke, Andreas
AU - Bratt, Harry
AU - Gadde, Ramana Rao
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.
AB - Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.
UR - http://www.scopus.com/inward/record.url?scp=78651105404&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651105404&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2003.1318397
DO - 10.1109/ASRU.2003.1318397
M3 - Conference contribution
AN - SCOPUS:78651105404
T3 - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
SP - 19
EP - 24
BT - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
Y2 - 30 November 2003 through 4 December 2003
ER -