Compression of line spectral frequency parameters with asynchronous interpolation

Rachel Moldover; Alexander Kain

doi:10.1109/ICASSP.2009.4960452

Compression of line spectral frequency parameters with asynchronous interpolation

Rachel Moldover, Alexander Kain

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

TTS systems require a trade-off between size and speech quality. A larger acoustic inventory allows synthesis of speech that sounds more natural. The Asynchronous Interpolation Model improves the quality to size ratio, allowing better compression of large acoustic inventories, as well as better quality speech from a small system. At maximum compression, our method represents most phonemes by a single frame of data. Coarticulation effects are specified as contextspecific non-linear interpolation functions. Dividing the speech features into multiple data streams allows asynchronous interpolation. In this study, AIM was applied to LSF parameters. Varying the number of streams allows for variable amount of compression. We used three different objective measures to investigate the effect of number and partitioning of streams. The first few weight functions (and the last one) seem to offer the most error reduction. Partitions separating the first 6 LSFs score well with all three measures.

Original language	English (US)
Title of host publication	2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages	3789-3792
Number of pages	4
DOIs	https://doi.org/10.1109/ICASSP.2009.4960452
State	Published - 2009
Event	2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China Duration: Apr 19 2009 → Apr 24 2009

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)	1520-6149

Other

Other	2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Country/Territory	Taiwan, Province of China
City	Taipei
Period	4/19/09 → 4/24/09

Keywords

Acoustic inventory
Compression
Speech synthesis
TTS
Temporal decomposition

ASJC Scopus subject areas

Software
Signal Processing
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2009.4960452

Cite this

Moldover, R., & Kain, A. (2009). Compression of line spectral frequency parameters with asynchronous interpolation. In 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009 (pp. 3789-3792). Article 4960452 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2009.4960452

Compression of line spectral frequency parameters with asynchronous interpolation. / Moldover, Rachel; Kain, Alexander.
2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009. 2009. p. 3789-3792 4960452 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Moldover, R & Kain, A 2009, Compression of line spectral frequency parameters with asynchronous interpolation. in 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009., 4960452, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 3789-3792, 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, Taipei, Taiwan, Province of China, 4/19/09. https://doi.org/10.1109/ICASSP.2009.4960452

Moldover R, Kain A. Compression of line spectral frequency parameters with asynchronous interpolation. In 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009. 2009. p. 3789-3792. 4960452. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2009.4960452

@inproceedings{c87df12055804c1cbb0f419dc96d3751,

title = "Compression of line spectral frequency parameters with asynchronous interpolation",

abstract = "TTS systems require a trade-off between size and speech quality. A larger acoustic inventory allows synthesis of speech that sounds more natural. The Asynchronous Interpolation Model improves the quality to size ratio, allowing better compression of large acoustic inventories, as well as better quality speech from a small system. At maximum compression, our method represents most phonemes by a single frame of data. Coarticulation effects are specified as contextspecific non-linear interpolation functions. Dividing the speech features into multiple data streams allows asynchronous interpolation. In this study, AIM was applied to LSF parameters. Varying the number of streams allows for variable amount of compression. We used three different objective measures to investigate the effect of number and partitioning of streams. The first few weight functions (and the last one) seem to offer the most error reduction. Partitions separating the first 6 LSFs score well with all three measures.",

keywords = "Acoustic inventory, Compression, Speech synthesis, TTS, Temporal decomposition",

author = "Rachel Moldover and Alexander Kain",

year = "2009",

doi = "10.1109/ICASSP.2009.4960452",

language = "English (US)",

isbn = "9781424423545",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

pages = "3789--3792",

booktitle = "2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009",

note = "2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 ; Conference date: 19-04-2009 Through 24-04-2009",

}

TY - GEN

T1 - Compression of line spectral frequency parameters with asynchronous interpolation

AU - Moldover, Rachel

AU - Kain, Alexander

PY - 2009

Y1 - 2009

N2 - TTS systems require a trade-off between size and speech quality. A larger acoustic inventory allows synthesis of speech that sounds more natural. The Asynchronous Interpolation Model improves the quality to size ratio, allowing better compression of large acoustic inventories, as well as better quality speech from a small system. At maximum compression, our method represents most phonemes by a single frame of data. Coarticulation effects are specified as contextspecific non-linear interpolation functions. Dividing the speech features into multiple data streams allows asynchronous interpolation. In this study, AIM was applied to LSF parameters. Varying the number of streams allows for variable amount of compression. We used three different objective measures to investigate the effect of number and partitioning of streams. The first few weight functions (and the last one) seem to offer the most error reduction. Partitions separating the first 6 LSFs score well with all three measures.

AB - TTS systems require a trade-off between size and speech quality. A larger acoustic inventory allows synthesis of speech that sounds more natural. The Asynchronous Interpolation Model improves the quality to size ratio, allowing better compression of large acoustic inventories, as well as better quality speech from a small system. At maximum compression, our method represents most phonemes by a single frame of data. Coarticulation effects are specified as contextspecific non-linear interpolation functions. Dividing the speech features into multiple data streams allows asynchronous interpolation. In this study, AIM was applied to LSF parameters. Varying the number of streams allows for variable amount of compression. We used three different objective measures to investigate the effect of number and partitioning of streams. The first few weight functions (and the last one) seem to offer the most error reduction. Partitions separating the first 6 LSFs score well with all three measures.

KW - Acoustic inventory

KW - Compression

KW - Speech synthesis

KW - TTS

KW - Temporal decomposition

UR - http://www.scopus.com/inward/record.url?scp=70349200817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349200817&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2009.4960452

DO - 10.1109/ICASSP.2009.4960452

M3 - Conference contribution

AN - SCOPUS:70349200817

SN - 9781424423545

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 3789

EP - 3792

BT - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009

T2 - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009

Y2 - 19 April 2009 through 24 April 2009

ER -

Compression of line spectral frequency parameters with asynchronous interpolation

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this