A speech model of acoustic inventories based on asynchronous interpolation

Alexander Kain; Jan Van Santen

A speech model of acoustic inventories based on asynchronous interpolation

Alexander Kain, Jan Van Santen

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

Original language	English (US)
Title of host publication	EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
Publisher	International Speech Communication Association
Pages	329-332
Number of pages	4
State	Published - 2003
Event	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: Sep 1 2003 → Sep 4 2003

Other

Other	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
Country/Territory	Switzerland
City	Geneva
Period	9/1/03 → 9/4/03

ASJC Scopus subject areas

Computer Science Applications
Software
Linguistics and Language
Communication

Cite this

@inproceedings{0a06c718d4d647078e7a7b233b6b88cc,

title = "A speech model of acoustic inventories based on asynchronous interpolation",

abstract = "We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.",

author = "Alexander Kain and {Van Santen}, Jan",

year = "2003",

language = "English (US)",

pages = "329--332",

booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",

publisher = "International Speech Communication Association",

note = "8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 ; Conference date: 01-09-2003 Through 04-09-2003",

}

TY - GEN

T1 - A speech model of acoustic inventories based on asynchronous interpolation

AU - Kain, Alexander

AU - Van Santen, Jan

PY - 2003

Y1 - 2003

N2 - We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

AB - We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthezised speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

UR - http://www.scopus.com/inward/record.url?scp=85009159765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009159765&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009159765

SP - 329

EP - 332

BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

PB - International Speech Communication Association

T2 - 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003

Y2 - 1 September 2003 through 4 September 2003

ER -

A speech model of acoustic inventories based on asynchronous interpolation

Abstract

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this