Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence

Peter Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan O. Selfridge

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence occurs as training progresses. The work in this paper will help dialogue designers build effective policies and understand how much training is necessary.

Original languageEnglish (US)
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages746-749
Number of pages4
Volume1
Publication statusPublished - 2012
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: Sep 9 2012Sep 13 2012

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
CountryUnited States
CityPortland, OR
Period9/9/129/13/12

    Fingerprint

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Cite this

Heeman, P., Fryer, J., Lunsford, R., Rueckert, A., & Selfridge, E. O. (2012). Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 (Vol. 1, pp. 746-749)