The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database

Research output: Contribution to journalArticle

7 Scopus citations


One of the major problems in concatenative synthesis is the occurrence of audible discontinuities between two successive concatenative units. Several studies have attempted to discover objective distance measures that predict the audibility of these discontinuities. In this paper, we investigate mid-vowel joins for three vowels with a range of post-vocalic consonant contexts typical for diphone databases. A first perceptual experiment uses a pairwise comparison procedure to find two subsets of unit combinations: Those with versus without audible discontinuities. A second perceptual experiment uses these two subsets in a procedure where formant resynthesis is used to manipulate three sources of discontinuity separately: formant frequencies, formant bandwidths, and overall energy. Results show mismatch in formant frequencies provides the largest contribution to audible discontinuity, followed by mismatch in overall energy

Original languageEnglish (US)
Article number4100687
Pages (from-to)949-956
Number of pages8
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number3
StatePublished - Mar 1 2007



  • Audible discontinuities
  • Diphones
  • Spectral distance measures
  • Speech synthesis

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this