A novel pitch decomposition method for the generalized linear alignment model

Mahsa Sadat Elyasi Langarani, Esther Klabbers, Jan Van Santen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2584-2588
Number of pages5
ISBN (Print)9781479928927
DOIs
StatePublished - 2014
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
Duration: May 4 2014May 9 2014

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CountryItaly
CityFlorence
Period5/4/145/9/14

Fingerprint

Decomposition
Normal distribution
Acoustic waves

Keywords

  • prosody modeling
  • superpositional model
  • text-to-speech synthesis

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Langarani, M. S. E., Klabbers, E., & Van Santen, J. (2014). A novel pitch decomposition method for the generalized linear alignment model. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 2584-2588). [6854067] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854067

A novel pitch decomposition method for the generalized linear alignment model. / Langarani, Mahsa Sadat Elyasi; Klabbers, Esther; Van Santen, Jan.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. p. 2584-2588 6854067.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Langarani, MSE, Klabbers, E & Van Santen, J 2014, A novel pitch decomposition method for the generalized linear alignment model. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6854067, Institute of Electrical and Electronics Engineers Inc., pp. 2584-2588, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, 5/4/14. https://doi.org/10.1109/ICASSP.2014.6854067
Langarani MSE, Klabbers E, Van Santen J. A novel pitch decomposition method for the generalized linear alignment model. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2014. p. 2584-2588. 6854067 https://doi.org/10.1109/ICASSP.2014.6854067
Langarani, Mahsa Sadat Elyasi ; Klabbers, Esther ; Van Santen, Jan. / A novel pitch decomposition method for the generalized linear alignment model. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 2584-2588
@inproceedings{c58f4d52ad214435b7165c66e8ac8409,
title = "A novel pitch decomposition method for the generalized linear alignment model",
abstract = "Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.",
keywords = "prosody modeling, superpositional model, text-to-speech synthesis",
author = "Langarani, {Mahsa Sadat Elyasi} and Esther Klabbers and {Van Santen}, Jan",
year = "2014",
doi = "10.1109/ICASSP.2014.6854067",
language = "English (US)",
isbn = "9781479928927",
pages = "2584--2588",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A novel pitch decomposition method for the generalized linear alignment model

AU - Langarani, Mahsa Sadat Elyasi

AU - Klabbers, Esther

AU - Van Santen, Jan

PY - 2014

Y1 - 2014

N2 - Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.

AB - Superpositional models of intonation typically propose decomposing fundamental frequency (F0) contours into phrase curves and accent curves, aligned with phrases and left-headed feet, respectively. Extracting these component curves from F0 contours without making undue assumptions is challenging. We propose a novel method for decomposing pitch curves, based on the assumption that accent curves can be described by combining skewed normal distributions and sigmoid functions. In contrast to an earlier pitch decomposition algorithm ('PRISM'), this allows for simple joint optimization of phrase and accent curve parameters, using fewer parameters. The proposed method was evaluated on three speech corpora containing: (1) synthetically generated pitch curves, (2) all-sonorant utterances, and (3) utterances containing both sonorant and non-sonorant speech sounds. The root weighted mean squared error is small, and, on the corpus for which comparable data are available, is significantly smaller than for PRISM.

KW - prosody modeling

KW - superpositional model

KW - text-to-speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=84905229466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905229466&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854067

DO - 10.1109/ICASSP.2014.6854067

M3 - Conference contribution

SN - 9781479928927

SP - 2584

EP - 2588

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -