Abstract
In earlier research, we applied a linear weighted cross-fading function to ensure smooth concatenation. However, this can cause unnaturally shaped spectral trajectories. We propose context-sensitive cross-fading. To train this system, a perceptually validated cost function is needed, which is the focus of this paper. A corpus was designed to generate a variety of formant trajectory shapes. A perceptual experiment was performed and a multiple linear regression model was applied to predict perceptual quality ratings from various distances between cross-faded and natural trajectories. Results show that perceptual quality could be predicted well from the proposed distance measures.
Original language | English (US) |
---|---|
Pages (from-to) | 732-735 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
State | Published - 2009 |
Externally published | Yes |
Event | 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom Duration: Sep 6 2009 → Sep 10 2009 |
Keywords
- Concatenation errors
- Cross-fading function
- Formant frequency
- Perceptual score
ASJC Scopus subject areas
- Human-Computer Interaction
- Signal Processing
- Software
- Sensory Systems