Speech recognition as feature extraction for speaker recognition

A. Stolcke, E. Shriberg, L. Ferrer, S. Kajarekar, Mustafa (Kemal) Sonmez, G. Tur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech. Finally, we discuss the use of mere byproducts of word recognition, such as subword unit alignments, pronunciations, and speaker adaptation transforms to derive powerful nonstandard features for speaker modeling. We present specific techniques and results from SRI's NIST speaker recognition evaluation system.

Original languageEnglish (US)
Title of host publicationProceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Print)1424412269, 9781424412266
StatePublished - 2007
Externally publishedYes
EventWorkshop on Signal Processing Applications for Public Security and Forensics, SAFE 2007 - Washington, United States
Duration: Apr 11 2007Apr 13 2007

Other

OtherWorkshop on Signal Processing Applications for Public Security and Forensics, SAFE 2007
CountryUnited States
CityWashington
Period4/11/074/13/07

Fingerprint

Speech recognition
Byproducts
Feature extraction
evaluation

Keywords

  • High-level features
  • Prosody
  • Speaker adaptation
  • Speaker recognition
  • Speech recognition

ASJC Scopus subject areas

  • Law
  • Computer Vision and Pattern Recognition
  • Signal Processing

Cite this

Stolcke, A., Shriberg, E., Ferrer, L., Kajarekar, S., Sonmez, M. K., & Tur, G. (2007). Speech recognition as feature extraction for speaker recognition. In Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics [4218939] Institute of Electrical and Electronics Engineers Inc..

Speech recognition as feature extraction for speaker recognition. / Stolcke, A.; Shriberg, E.; Ferrer, L.; Kajarekar, S.; Sonmez, Mustafa (Kemal); Tur, G.

Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics. Institute of Electrical and Electronics Engineers Inc., 2007. 4218939.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stolcke, A, Shriberg, E, Ferrer, L, Kajarekar, S, Sonmez, MK & Tur, G 2007, Speech recognition as feature extraction for speaker recognition. in Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics., 4218939, Institute of Electrical and Electronics Engineers Inc., Workshop on Signal Processing Applications for Public Security and Forensics, SAFE 2007, Washington, United States, 4/11/07.
Stolcke A, Shriberg E, Ferrer L, Kajarekar S, Sonmez MK, Tur G. Speech recognition as feature extraction for speaker recognition. In Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics. Institute of Electrical and Electronics Engineers Inc. 2007. 4218939
Stolcke, A. ; Shriberg, E. ; Ferrer, L. ; Kajarekar, S. ; Sonmez, Mustafa (Kemal) ; Tur, G. / Speech recognition as feature extraction for speaker recognition. Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics. Institute of Electrical and Electronics Engineers Inc., 2007.
@inproceedings{1231b254646941b0a73b71039d87aaf8,
title = "Speech recognition as feature extraction for speaker recognition",
abstract = "Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech. Finally, we discuss the use of mere byproducts of word recognition, such as subword unit alignments, pronunciations, and speaker adaptation transforms to derive powerful nonstandard features for speaker modeling. We present specific techniques and results from SRI's NIST speaker recognition evaluation system.",
keywords = "High-level features, Prosody, Speaker adaptation, Speaker recognition, Speech recognition",
author = "A. Stolcke and E. Shriberg and L. Ferrer and S. Kajarekar and Sonmez, {Mustafa (Kemal)} and G. Tur",
year = "2007",
language = "English (US)",
isbn = "1424412269",
booktitle = "Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Speech recognition as feature extraction for speaker recognition

AU - Stolcke, A.

AU - Shriberg, E.

AU - Ferrer, L.

AU - Kajarekar, S.

AU - Sonmez, Mustafa (Kemal)

AU - Tur, G.

PY - 2007

Y1 - 2007

N2 - Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech. Finally, we discuss the use of mere byproducts of word recognition, such as subword unit alignments, pronunciations, and speaker adaptation transforms to derive powerful nonstandard features for speaker modeling. We present specific techniques and results from SRI's NIST speaker recognition evaluation system.

AB - Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech. Finally, we discuss the use of mere byproducts of word recognition, such as subword unit alignments, pronunciations, and speaker adaptation transforms to derive powerful nonstandard features for speaker modeling. We present specific techniques and results from SRI's NIST speaker recognition evaluation system.

KW - High-level features

KW - Prosody

KW - Speaker adaptation

KW - Speaker recognition

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84969260568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969260568&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1424412269

SN - 9781424412266

BT - Proceedings - SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics

PB - Institute of Electrical and Electronics Engineers Inc.

ER -