Abstract
This paper describes perceptual methods for diagnosing problems in text-to-speech systems. Special attention is paid to two issues. First, coverageof the domain of a text-to-speech system. Since thisdomain involves an enormous range of contexts, it iscriticial for diagnostics, and also for overall evaluation, that test materials cover this range to the fullest extent possible. Automatic text generation algorithms that make extensive use of “greedy” algorithms are described that serve this purpose. Second, speech generated by text-to-speech systems tends to have a great variety of problems. A battery of experimental paradigms is discussed that address different facets of speech quality and intelligibility. Included are: (a) “word pointing” method for detection of problematic concatenative units, (b) “minimal pairs intelligibility test”-an expanded diagnostic rhyme test; (c) automatically scored orthographic name transcription task; (d) mean opinion score paradigm with problem categorization; and (e) paired comparison paradigm with strength-of-choice rating. The methods are applied in a series of experiments on high-end text-to-speech systems.
Original language | English (US) |
---|---|
Pages (from-to) | 49-100 |
Number of pages | 52 |
Journal | Computer Speech and Language |
Volume | 7 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1993 |
Externally published | Yes |
ASJC Scopus subject areas
- Theoretical Computer Science
- Software
- Human-Computer Interaction