A speech model of acoustic inventories based on asynchronous interpolation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

Original languageEnglish (US)
Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages329-332
Number of pages4
StatePublished - 2003
Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: Sep 1 2003Sep 4 2003

Other

Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
CountrySwitzerland
CityGeneva
Period9/1/039/4/03

Fingerprint

acoustics
Interpolation
Acoustics
recording
Values
time

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Linguistics and Language
  • Communication

Cite this

Kain, A., & Van Santen, J. (2003). A speech model of acoustic inventories based on asynchronous interpolation. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 329-332). International Speech Communication Association.

A speech model of acoustic inventories based on asynchronous interpolation. / Kain, Alexander; Van Santen, Jan.

EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. p. 329-332.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kain, A & Van Santen, J 2003, A speech model of acoustic inventories based on asynchronous interpolation. in EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 329-332, 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, 9/1/03.
Kain A, Van Santen J. A speech model of acoustic inventories based on asynchronous interpolation. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association. 2003. p. 329-332
Kain, Alexander ; Van Santen, Jan. / A speech model of acoustic inventories based on asynchronous interpolation. EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. pp. 329-332
@inproceedings{0a06c718d4d647078e7a7b233b6b88cc,
title = "A speech model of acoustic inventories based on asynchronous interpolation",
abstract = "We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.",
author = "Alexander Kain and {Van Santen}, Jan",
year = "2003",
language = "English (US)",
pages = "329--332",
booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",

}

TY - GEN

T1 - A speech model of acoustic inventories based on asynchronous interpolation

AU - Kain, Alexander

AU - Van Santen, Jan

PY - 2003

Y1 - 2003

N2 - We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

AB - We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

UR - http://www.scopus.com/inward/record.url?scp=85009159765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009159765&partnerID=8YFLogxK

M3 - Conference contribution

SP - 329

EP - 332

BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

PB - International Speech Communication Association

ER -