Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

Alexander Kain, M. W. Macon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

90 Citations (Scopus)

Abstract

The purpose of a voice conversion (VC) system is to change the perceived speaker identity of a speech signal. In this paper, we propose a new algorithm based on converting the LPC spectrum and predicting the residual as a function of the target envelope parameters. We conduct listening tests based on speaker discrimination of same/difference pairs to measure the accuracy by which the converted voices match the desired target voices. To establish the level of human performance as a baseline, we first measure the ability of listeners to discriminate between original speech utterances under three conditions: normal, fundamental frequency and duration normalized, and LPC coded. Additionally, the spectral parameter conversion function is tested in isolation by listening to source, target, and converted speakers as LPC coded speech. The results show that the speaker identity of speech whose LPC spectrum has been converted can be recognized as the target speaker with the same level of performance as discriminating between LPC coded speech. However, the level of discrimination of converted utterances produced by the full VC system is significantly below that of speaker discrimination of natural speech.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages813-816
Number of pages4
Volume2
StatePublished - 2001
Event2001 IEEE Interntional Conference on Acoustics, Speech, and Signal Processing - Salt Lake, UT, United States
Duration: May 7 2001May 11 2001

Other

Other2001 IEEE Interntional Conference on Acoustics, Speech, and Signal Processing
CountryUnited States
CitySalt Lake, UT
Period5/7/015/11/01

Fingerprint

envelopes
evaluation
predictions
discrimination
human performance
isolation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Kain, A., & Macon, M. W. (2001). Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2, pp. 813-816)

Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. / Kain, Alexander; Macon, M. W.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2 2001. p. 813-816.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kain, A & Macon, MW 2001, Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 2, pp. 813-816, 2001 IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, Salt Lake, UT, United States, 5/7/01.
Kain A, Macon MW. Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2. 2001. p. 813-816
Kain, Alexander ; Macon, M. W. / Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2 2001. pp. 813-816
@inproceedings{dce7975762614bb48323a1d509872250,
title = "Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction",
abstract = "The purpose of a voice conversion (VC) system is to change the perceived speaker identity of a speech signal. In this paper, we propose a new algorithm based on converting the LPC spectrum and predicting the residual as a function of the target envelope parameters. We conduct listening tests based on speaker discrimination of same/difference pairs to measure the accuracy by which the converted voices match the desired target voices. To establish the level of human performance as a baseline, we first measure the ability of listeners to discriminate between original speech utterances under three conditions: normal, fundamental frequency and duration normalized, and LPC coded. Additionally, the spectral parameter conversion function is tested in isolation by listening to source, target, and converted speakers as LPC coded speech. The results show that the speaker identity of speech whose LPC spectrum has been converted can be recognized as the target speaker with the same level of performance as discriminating between LPC coded speech. However, the level of discrimination of converted utterances produced by the full VC system is significantly below that of speaker discrimination of natural speech.",
author = "Alexander Kain and Macon, {M. W.}",
year = "2001",
language = "English (US)",
volume = "2",
pages = "813--816",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

AU - Kain, Alexander

AU - Macon, M. W.

PY - 2001

Y1 - 2001

N2 - The purpose of a voice conversion (VC) system is to change the perceived speaker identity of a speech signal. In this paper, we propose a new algorithm based on converting the LPC spectrum and predicting the residual as a function of the target envelope parameters. We conduct listening tests based on speaker discrimination of same/difference pairs to measure the accuracy by which the converted voices match the desired target voices. To establish the level of human performance as a baseline, we first measure the ability of listeners to discriminate between original speech utterances under three conditions: normal, fundamental frequency and duration normalized, and LPC coded. Additionally, the spectral parameter conversion function is tested in isolation by listening to source, target, and converted speakers as LPC coded speech. The results show that the speaker identity of speech whose LPC spectrum has been converted can be recognized as the target speaker with the same level of performance as discriminating between LPC coded speech. However, the level of discrimination of converted utterances produced by the full VC system is significantly below that of speaker discrimination of natural speech.

AB - The purpose of a voice conversion (VC) system is to change the perceived speaker identity of a speech signal. In this paper, we propose a new algorithm based on converting the LPC spectrum and predicting the residual as a function of the target envelope parameters. We conduct listening tests based on speaker discrimination of same/difference pairs to measure the accuracy by which the converted voices match the desired target voices. To establish the level of human performance as a baseline, we first measure the ability of listeners to discriminate between original speech utterances under three conditions: normal, fundamental frequency and duration normalized, and LPC coded. Additionally, the spectral parameter conversion function is tested in isolation by listening to source, target, and converted speakers as LPC coded speech. The results show that the speaker identity of speech whose LPC spectrum has been converted can be recognized as the target speaker with the same level of performance as discriminating between LPC coded speech. However, the level of discrimination of converted utterances produced by the full VC system is significantly below that of speaker discrimination of natural speech.

UR - http://www.scopus.com/inward/record.url?scp=0034841948&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034841948&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0034841948

VL - 2

SP - 813

EP - 816

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -