A comparison of sentence-level speech intelligibility metrics

Alexander Kain, Max Del Giudice, Kris Tjaden

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


We examine existing and novel automatically-derived acoustic metrics that are predictive of speech intelligibility. We hypothesize that the degree of variability in feature space is correlated with the extent of a speaker's phonemic inventory, their degree of articulatory displacements, and thus with their degree of perceived speech intelligibility. We begin by using fully-automatic F1/F2 formant frequency trajectories for both vowel space area calculation and as input to a proposed class-separability metric. We then switch to representing vowels by means of short-term spectral features, and measure vowel separability in that space. Finally, we consider the case where phonetic labeling is unavailable; here we calculate short-term spectral features for the entire speech utterance and then estimate their entropy based on the length of a minimum spanning tree. In an alternative approach, we propose to first segment the speech signal using a hidden Markov model, and then calculate spectral feature separability based on the automatically-derived classes. We apply all approaches to a database with healthy controls as well as speakers with mild dysarthria, and report the resulting coefficients of determination.

Original languageEnglish (US)
Pages (from-to)1148-1152
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017


  • Intelligibility
  • Vowel space area

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation


Dive into the research topics of 'A comparison of sentence-level speech intelligibility metrics'. Together they form a unique fingerprint.

Cite this