Perceptual cost function for cross-fading based concatenation

Qi Miao; Alexander Kain; Jan P.H. Van Santen

Perceptual cost function for cross-fading based concatenation

Qi Miao, Alexander Kain, Jan P.H. Van Santen

Institute on Development and Disability

Research output: Contribution to journal › Conference article › peer-review

Abstract

In earlier research, we applied a linear weighted cross-fading function to ensure smooth concatenation. However, this can cause unnaturally shaped spectral trajectories. We propose context-sensitive cross-fading. To train this system, a perceptually validated cost function is needed, which is the focus of this paper. A corpus was designed to generate a variety of formant trajectory shapes. A perceptual experiment was performed and a multiple linear regression model was applied to predict perceptual quality ratings from various distances between cross-faded and natural trajectories. Results show that perceptual quality could be predicted well from the proposed distance measures.

Original language	English (US)
Pages (from-to)	732-735
Number of pages	4
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
State	Published - 2009
Event	10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom Duration: Sep 6 2009 → Sep 10 2009

Keywords

Concatenation errors
Cross-fading function
Formant frequency
Perceptual score

ASJC Scopus subject areas

Human-Computer Interaction
Signal Processing
Software
Sensory Systems

Cite this

@article{042b6e47fd8a4c7eb23c184f2aead947,

title = "Perceptual cost function for cross-fading based concatenation",

abstract = "In earlier research, we applied a linear weighted cross-fading function to ensure smooth concatenation. However, this can cause unnaturally shaped spectral trajectories. We propose context-sensitive cross-fading. To train this system, a perceptually validated cost function is needed, which is the focus of this paper. A corpus was designed to generate a variety of formant trajectory shapes. A perceptual experiment was performed and a multiple linear regression model was applied to predict perceptual quality ratings from various distances between cross-faded and natural trajectories. Results show that perceptual quality could be predicted well from the proposed distance measures.",

keywords = "Concatenation errors, Cross-fading function, Formant frequency, Perceptual score",

author = "Qi Miao and Alexander Kain and {Van Santen}, {Jan P.H.}",

year = "2009",

language = "English (US)",

pages = "732--735",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

note = "10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 ; Conference date: 06-09-2009 Through 10-09-2009",

}

TY - JOUR

T1 - Perceptual cost function for cross-fading based concatenation

AU - Miao, Qi

AU - Kain, Alexander

AU - Van Santen, Jan P.H.

PY - 2009

Y1 - 2009

N2 - In earlier research, we applied a linear weighted cross-fading function to ensure smooth concatenation. However, this can cause unnaturally shaped spectral trajectories. We propose context-sensitive cross-fading. To train this system, a perceptually validated cost function is needed, which is the focus of this paper. A corpus was designed to generate a variety of formant trajectory shapes. A perceptual experiment was performed and a multiple linear regression model was applied to predict perceptual quality ratings from various distances between cross-faded and natural trajectories. Results show that perceptual quality could be predicted well from the proposed distance measures.

AB - In earlier research, we applied a linear weighted cross-fading function to ensure smooth concatenation. However, this can cause unnaturally shaped spectral trajectories. We propose context-sensitive cross-fading. To train this system, a perceptually validated cost function is needed, which is the focus of this paper. A corpus was designed to generate a variety of formant trajectory shapes. A perceptual experiment was performed and a multiple linear regression model was applied to predict perceptual quality ratings from various distances between cross-faded and natural trajectories. Results show that perceptual quality could be predicted well from the proposed distance measures.

KW - Concatenation errors

KW - Cross-fading function

KW - Formant frequency

KW - Perceptual score

UR - http://www.scopus.com/inward/record.url?scp=70450161987&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450161987&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:70450161987

SN - 2308-457X

SP - 732

EP - 735

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009

Y2 - 6 September 2009 through 10 September 2009

ER -

Perceptual cost function for cross-fading based concatenation

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this