An overview of voice conversion systems

Seyed Hamidreza Mohammadi, Alexander Kain

Research output: Contribution to journalReview article

45 Citations (Scopus)

Abstract

Voice transformation (VT) aims to change one or more aspects of a speech signal while preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to change a source speaker's speech in such a way that the generated output is perceived as a sentence uttered by a target speaker. Despite many years of research, VC systems still exhibit deficiencies in accurately mimicking a target speaker spectrally and prosodically, and simultaneously maintaining high speech quality. In this work we provide an overview of real-world applications, extensively study existing systems proposed in the literature, and discuss remaining challenges.

Original languageEnglish (US)
Pages (from-to)65-82
Number of pages18
JournalSpeech Communication
Volume88
DOIs
StatePublished - Apr 1 2017

Fingerprint

Voice Conversion
Target
Speech Signal
Real-world Applications
Subset
Output
linguistics
Voice
Speech
Linguistics

Keywords

  • Overview
  • Survey
  • Voice conversion

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Modeling and Simulation
  • Communication
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

An overview of voice conversion systems. / Mohammadi, Seyed Hamidreza; Kain, Alexander.

In: Speech Communication, Vol. 88, 01.04.2017, p. 65-82.

Research output: Contribution to journalReview article

Mohammadi, Seyed Hamidreza ; Kain, Alexander. / An overview of voice conversion systems. In: Speech Communication. 2017 ; Vol. 88. pp. 65-82.
@article{81efc32e544949fea3766b2cf98d418d,
title = "An overview of voice conversion systems",
abstract = "Voice transformation (VT) aims to change one or more aspects of a speech signal while preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to change a source speaker's speech in such a way that the generated output is perceived as a sentence uttered by a target speaker. Despite many years of research, VC systems still exhibit deficiencies in accurately mimicking a target speaker spectrally and prosodically, and simultaneously maintaining high speech quality. In this work we provide an overview of real-world applications, extensively study existing systems proposed in the literature, and discuss remaining challenges.",
keywords = "Overview, Survey, Voice conversion",
author = "Mohammadi, {Seyed Hamidreza} and Alexander Kain",
year = "2017",
month = "4",
day = "1",
doi = "10.1016/j.specom.2017.01.008",
language = "English (US)",
volume = "88",
pages = "65--82",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

TY - JOUR

T1 - An overview of voice conversion systems

AU - Mohammadi, Seyed Hamidreza

AU - Kain, Alexander

PY - 2017/4/1

Y1 - 2017/4/1

N2 - Voice transformation (VT) aims to change one or more aspects of a speech signal while preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to change a source speaker's speech in such a way that the generated output is perceived as a sentence uttered by a target speaker. Despite many years of research, VC systems still exhibit deficiencies in accurately mimicking a target speaker spectrally and prosodically, and simultaneously maintaining high speech quality. In this work we provide an overview of real-world applications, extensively study existing systems proposed in the literature, and discuss remaining challenges.

AB - Voice transformation (VT) aims to change one or more aspects of a speech signal while preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to change a source speaker's speech in such a way that the generated output is perceived as a sentence uttered by a target speaker. Despite many years of research, VC systems still exhibit deficiencies in accurately mimicking a target speaker spectrally and prosodically, and simultaneously maintaining high speech quality. In this work we provide an overview of real-world applications, extensively study existing systems proposed in the literature, and discuss remaining challenges.

KW - Overview

KW - Survey

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=85010399617&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85010399617&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2017.01.008

DO - 10.1016/j.specom.2017.01.008

M3 - Review article

VL - 88

SP - 65

EP - 82

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -