The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database

Esther Klabbers, Jan Van Santen, Alexander Kain

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

One of the major problems in concatenative synthesis is the occurrence of audible discontinuities between two successive concatenative units. Several studies have attempted to discover objective distance measures that predict the audibility of these discontinuities. In this paper, we investigate mid-vowel joins for three vowels with a range of post-vocalic consonant contexts typical for diphone databases. A first perceptual experiment uses a pairwise comparison procedure to find two subsets of unit combinations: Those with versus without audible discontinuities. A second perceptual experiment uses these two subsets in a procedure where formant resynthesis is used to manipulate three sources of discontinuity separately: formant frequencies, formant bandwidths, and overall energy. Results show mismatch in formant frequencies provides the largest contribution to audible discontinuity, followed by mismatch in overall energy

Original languageEnglish (US)
Article number4100687
Pages (from-to)949-956
Number of pages8
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume15
Issue number3
DOIs
StatePublished - Mar 2007

Fingerprint

discontinuity
vowels
Experiments
set theory
Bandwidth
occurrences
bandwidth
energy
synthesis

Keywords

  • Audible discontinuities
  • Diphones
  • Spectral distance measures
  • Speech synthesis

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

@article{639fd2f4874e4996be42223826f5eefe,
title = "The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database",
abstract = "One of the major problems in concatenative synthesis is the occurrence of audible discontinuities between two successive concatenative units. Several studies have attempted to discover objective distance measures that predict the audibility of these discontinuities. In this paper, we investigate mid-vowel joins for three vowels with a range of post-vocalic consonant contexts typical for diphone databases. A first perceptual experiment uses a pairwise comparison procedure to find two subsets of unit combinations: Those with versus without audible discontinuities. A second perceptual experiment uses these two subsets in a procedure where formant resynthesis is used to manipulate three sources of discontinuity separately: formant frequencies, formant bandwidths, and overall energy. Results show mismatch in formant frequencies provides the largest contribution to audible discontinuity, followed by mismatch in overall energy",
keywords = "Audible discontinuities, Diphones, Spectral distance measures, Speech synthesis",
author = "Esther Klabbers and {Van Santen}, Jan and Alexander Kain",
year = "2007",
month = "3",
doi = "10.1109/TASL.2006.885250",
language = "English (US)",
volume = "15",
pages = "949--956",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database

AU - Klabbers, Esther

AU - Van Santen, Jan

AU - Kain, Alexander

PY - 2007/3

Y1 - 2007/3

N2 - One of the major problems in concatenative synthesis is the occurrence of audible discontinuities between two successive concatenative units. Several studies have attempted to discover objective distance measures that predict the audibility of these discontinuities. In this paper, we investigate mid-vowel joins for three vowels with a range of post-vocalic consonant contexts typical for diphone databases. A first perceptual experiment uses a pairwise comparison procedure to find two subsets of unit combinations: Those with versus without audible discontinuities. A second perceptual experiment uses these two subsets in a procedure where formant resynthesis is used to manipulate three sources of discontinuity separately: formant frequencies, formant bandwidths, and overall energy. Results show mismatch in formant frequencies provides the largest contribution to audible discontinuity, followed by mismatch in overall energy

AB - One of the major problems in concatenative synthesis is the occurrence of audible discontinuities between two successive concatenative units. Several studies have attempted to discover objective distance measures that predict the audibility of these discontinuities. In this paper, we investigate mid-vowel joins for three vowels with a range of post-vocalic consonant contexts typical for diphone databases. A first perceptual experiment uses a pairwise comparison procedure to find two subsets of unit combinations: Those with versus without audible discontinuities. A second perceptual experiment uses these two subsets in a procedure where formant resynthesis is used to manipulate three sources of discontinuity separately: formant frequencies, formant bandwidths, and overall energy. Results show mismatch in formant frequencies provides the largest contribution to audible discontinuity, followed by mismatch in overall energy

KW - Audible discontinuities

KW - Diphones

KW - Spectral distance measures

KW - Speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=56149089359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56149089359&partnerID=8YFLogxK

U2 - 10.1109/TASL.2006.885250

DO - 10.1109/TASL.2006.885250

M3 - Article

AN - SCOPUS:56149089359

VL - 15

SP - 949

EP - 956

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 3

M1 - 4100687

ER -