Automated assessment of prosody production

Jan Van Santen, Emily Tucker Prud'hommeaux, Lois M. Black

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Assessment of prosody is important for diagnosis and remediation of speech and language disorders, for diagnosis of neurological conditions, and for foreign language instruction. Current assessment is largely auditory-perceptual, which has obvious drawbacks; however, automation of assessment faces numerous obstacles. We propose methods for automatically assessing production of lexical stress, focus, phrasing, pragmatic style, and vocal affect. Speech was analyzed from children in six tasks designed to elicit specific prosodic contrasts. The methods involve dynamic and global features, using spectral, fundamental frequency, and temporal information. The automatically computed scores were validated against mean scores from judges who, in all but one task, listened to "prosodic minimal pairs" of recordings, each pair containing two utterances from the same child with approximately the same phonemic material but differing on a specific prosodic dimension, such as stress. The judges identified the prosodic categories of the two utterances and rated the strength of their contrast. For almost all tasks, we found that the automated scores correlated with the mean scores approximately as well as the judges' individual scores. Real-time scores assigned during examination - as is fairly typical in speech assessment - correlated substantially less than the automated scores with the mean scores.

Original languageEnglish (US)
Pages (from-to)1082-1097
Number of pages16
JournalSpeech Communication
Volume51
Issue number11
DOIs
StatePublished - Nov 2009
Externally publishedYes

Fingerprint

Prosody
Remediation
Automation
language instruction
Fundamental Frequency
automation
foreign language
recording
Disorder
pragmatics
Real-time
examination

Keywords

  • Acoustic analysis
  • Automated assessment
  • Language pathology
  • Prosody
  • Speech pathology

ASJC Scopus subject areas

  • Modeling and Simulation
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Software
  • Communication
  • Linguistics and Language

Cite this

Automated assessment of prosody production. / Van Santen, Jan; Prud'hommeaux, Emily Tucker; Black, Lois M.

In: Speech Communication, Vol. 51, No. 11, 11.2009, p. 1082-1097.

Research output: Contribution to journalArticle

Van Santen, Jan ; Prud'hommeaux, Emily Tucker ; Black, Lois M. / Automated assessment of prosody production. In: Speech Communication. 2009 ; Vol. 51, No. 11. pp. 1082-1097.
@article{83a4e2e30c73426594e6c960fd6fe2df,
title = "Automated assessment of prosody production",
abstract = "Assessment of prosody is important for diagnosis and remediation of speech and language disorders, for diagnosis of neurological conditions, and for foreign language instruction. Current assessment is largely auditory-perceptual, which has obvious drawbacks; however, automation of assessment faces numerous obstacles. We propose methods for automatically assessing production of lexical stress, focus, phrasing, pragmatic style, and vocal affect. Speech was analyzed from children in six tasks designed to elicit specific prosodic contrasts. The methods involve dynamic and global features, using spectral, fundamental frequency, and temporal information. The automatically computed scores were validated against mean scores from judges who, in all but one task, listened to {"}prosodic minimal pairs{"} of recordings, each pair containing two utterances from the same child with approximately the same phonemic material but differing on a specific prosodic dimension, such as stress. The judges identified the prosodic categories of the two utterances and rated the strength of their contrast. For almost all tasks, we found that the automated scores correlated with the mean scores approximately as well as the judges' individual scores. Real-time scores assigned during examination - as is fairly typical in speech assessment - correlated substantially less than the automated scores with the mean scores.",
keywords = "Acoustic analysis, Automated assessment, Language pathology, Prosody, Speech pathology",
author = "{Van Santen}, Jan and Prud'hommeaux, {Emily Tucker} and Black, {Lois M.}",
year = "2009",
month = "11",
doi = "10.1016/j.specom.2009.04.007",
language = "English (US)",
volume = "51",
pages = "1082--1097",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "11",

}

TY - JOUR

T1 - Automated assessment of prosody production

AU - Van Santen, Jan

AU - Prud'hommeaux, Emily Tucker

AU - Black, Lois M.

PY - 2009/11

Y1 - 2009/11

N2 - Assessment of prosody is important for diagnosis and remediation of speech and language disorders, for diagnosis of neurological conditions, and for foreign language instruction. Current assessment is largely auditory-perceptual, which has obvious drawbacks; however, automation of assessment faces numerous obstacles. We propose methods for automatically assessing production of lexical stress, focus, phrasing, pragmatic style, and vocal affect. Speech was analyzed from children in six tasks designed to elicit specific prosodic contrasts. The methods involve dynamic and global features, using spectral, fundamental frequency, and temporal information. The automatically computed scores were validated against mean scores from judges who, in all but one task, listened to "prosodic minimal pairs" of recordings, each pair containing two utterances from the same child with approximately the same phonemic material but differing on a specific prosodic dimension, such as stress. The judges identified the prosodic categories of the two utterances and rated the strength of their contrast. For almost all tasks, we found that the automated scores correlated with the mean scores approximately as well as the judges' individual scores. Real-time scores assigned during examination - as is fairly typical in speech assessment - correlated substantially less than the automated scores with the mean scores.

AB - Assessment of prosody is important for diagnosis and remediation of speech and language disorders, for diagnosis of neurological conditions, and for foreign language instruction. Current assessment is largely auditory-perceptual, which has obvious drawbacks; however, automation of assessment faces numerous obstacles. We propose methods for automatically assessing production of lexical stress, focus, phrasing, pragmatic style, and vocal affect. Speech was analyzed from children in six tasks designed to elicit specific prosodic contrasts. The methods involve dynamic and global features, using spectral, fundamental frequency, and temporal information. The automatically computed scores were validated against mean scores from judges who, in all but one task, listened to "prosodic minimal pairs" of recordings, each pair containing two utterances from the same child with approximately the same phonemic material but differing on a specific prosodic dimension, such as stress. The judges identified the prosodic categories of the two utterances and rated the strength of their contrast. For almost all tasks, we found that the automated scores correlated with the mean scores approximately as well as the judges' individual scores. Real-time scores assigned during examination - as is fairly typical in speech assessment - correlated substantially less than the automated scores with the mean scores.

KW - Acoustic analysis

KW - Automated assessment

KW - Language pathology

KW - Prosody

KW - Speech pathology

UR - http://www.scopus.com/inward/record.url?scp=67651000007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67651000007&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2009.04.007

DO - 10.1016/j.specom.2009.04.007

M3 - Article

VL - 51

SP - 1082

EP - 1097

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 11

ER -