Speaker recognition using prosodic and lexical features

Sachin Kajarekar; Luciana Ferrer; Anand Venkataraman; Kemal Sonmez; Elizabeth Shriberg; Andreas Stolcke; Harry Bratt; Ramana Rao Gadde

doi:10.1109/ASRU.2003.1318397

Speaker recognition using prosodic and lexical features

Sachin Kajarekar, Luciana Ferrer, Anand Venkataraman, Kemal Sonmez, Elizabeth Shriberg, Andreas Stolcke, Harry Bratt, Ramana Rao Gadde

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

19 Scopus citations

Abstract

Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.

Original language	English (US)
Title of host publication	2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	19-24
Number of pages	6
ISBN (Electronic)	0780379802, 9780780379800
DOIs	https://doi.org/10.1109/ASRU.2003.1318397
State	Published - 2003
Externally published	Yes
Event	IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 - St. Thomas, United States Duration: Nov 30 2003 → Dec 4 2003

Publication series

Name	2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

Other

Other	IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003
Country/Territory	United States
City	St. Thomas
Period	11/30/03 → 12/4/03

ASJC Scopus subject areas

Signal Processing
Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1109/ASRU.2003.1318397

Cite this

Kajarekar, S., Ferrer, L., Venkataraman, A., Sonmez, K., Shriberg, E., Stolcke, A., Bratt, H., & Gadde, R. R. (2003). Speaker recognition using prosodic and lexical features. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 (pp. 19-24). Article 1318397 (2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2003.1318397

Speaker recognition using prosodic and lexical features. / Kajarekar, Sachin; Ferrer, Luciana; Venkataraman, Anand et al.
2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc., 2003. p. 19-24 1318397 (2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kajarekar, S, Ferrer, L, Venkataraman, A, Sonmez, K, Shriberg, E, Stolcke, A, Bratt, H & Gadde, RR 2003, Speaker recognition using prosodic and lexical features. in 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003., 1318397, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, Institute of Electrical and Electronics Engineers Inc., pp. 19-24, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, St. Thomas, United States, 11/30/03. https://doi.org/10.1109/ASRU.2003.1318397

Kajarekar S, Ferrer L, Venkataraman A, Sonmez K, Shriberg E, Stolcke A et al. Speaker recognition using prosodic and lexical features. In 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003. Institute of Electrical and Electronics Engineers Inc. 2003. p. 19-24. 1318397. (2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003). doi: 10.1109/ASRU.2003.1318397

@inproceedings{450a09c2ba244930af62b6d7fd0c4ff2,

title = "Speaker recognition using prosodic and lexical features",

abstract = "Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.",

author = "Sachin Kajarekar and Luciana Ferrer and Anand Venkataraman and Kemal Sonmez and Elizabeth Shriberg and Andreas Stolcke and Harry Bratt and Gadde, {Ramana Rao}",

note = "Publisher Copyright: {\textcopyright} 2003 IEEE.; IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003 ; Conference date: 30-11-2003 Through 04-12-2003",

year = "2003",

doi = "10.1109/ASRU.2003.1318397",

language = "English (US)",

series = "2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "19--24",

booktitle = "2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003",

}

TY - GEN

T1 - Speaker recognition using prosodic and lexical features

AU - Kajarekar, Sachin

AU - Ferrer, Luciana

AU - Venkataraman, Anand

AU - Sonmez, Kemal

AU - Shriberg, Elizabeth

AU - Stolcke, Andreas

AU - Bratt, Harry

AU - Gadde, Ramana Rao

PY - 2003

Y1 - 2003

N2 - Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.

AB - Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contribution of modeling such prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task. We report results for (1) systems based on individual feature types alone, (2) systems in combination with a state-of-the-art frame-based baseline system, and (3) an all-system combination. Our results show that certain longer-term stylistic features provide powerful complementary information to both frame-level cepstral features and to each other. Stylistic features thus significantly improve speaker recognition performance over conventional systems, and offer promise for a variety of intelligence and security applications.

UR - http://www.scopus.com/inward/record.url?scp=78651105404&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78651105404&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2003.1318397

DO - 10.1109/ASRU.2003.1318397

M3 - Conference contribution

AN - SCOPUS:78651105404

T3 - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

SP - 19

EP - 24

BT - 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003

Y2 - 30 November 2003 through 4 December 2003

ER -

Speaker recognition using prosodic and lexical features

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this