Speaker recognition using prosodic and lexical features

Sachin Kajarekar, Luciana Ferrer, Anand Venkataraman, Kemal Sonmez, Elizabeth Shriberg, Andreas Stolcke, Harry Bratt, Ramana Rao Gadde

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.

Original languageEnglish (US)
Title of host publication2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages19-24
Number of pages6
ISBN (Electronic)0780379802, 9780780379800
DOIs
StatePublished - Jan 1 2003
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States
Duration: Nov 30 2003Dec 4 2003

Publication series

Name2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
CountryUnited States
CitySt. Thomas
Period11/30/0312/4/03

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Kajarekar, S., Ferrer, L., Venkataraman, A., Sonmez, K., Shriberg, E., Stolcke, A., Bratt, H., & Gadde, R. R. (2003). Speaker recognition using prosodic and lexical features. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 (pp. 19-24). [1318397] (2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2003.1318397