Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence

Peter A. Heeman; Jordan Fryer; Rebecca Lunsford; Andrew Rueckert; Ethan O. Selfridge

Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence

Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan O. Selfridge

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence occurs as training progresses. The work in this paper will help dialogue designers build effective policies and understand how much training is necessary.

Original language	English (US)
Title of host publication	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages	746-749
Number of pages	4
State	Published - 2012
Event	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States Duration: Sep 9 2012 → Sep 13 2012

Publication series

Name	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume	1

Other

Other	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Country/Territory	United States
City	Portland, OR
Period	9/9/12 → 9/13/12

ASJC Scopus subject areas

Computer Networks and Communications
Communication

Cite this

Heeman, P. A., Fryer, J., Lunsford, R., Rueckert, A., & Selfridge, E. O. (2012). Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 (pp. 746-749). (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 1).

Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence. / Heeman, Peter A.; Fryer, Jordan; Lunsford, Rebecca et al.
13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. p. 746-749 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 1).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Heeman, PA, Fryer, J, Lunsford, R, Rueckert, A & Selfridge, EO 2012, Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence. in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, vol. 1, pp. 746-749, 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Portland, OR, United States, 9/9/12.

Heeman PA, Fryer J, Lunsford R, Rueckert A, Selfridge EO. Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. p. 746-749. (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012).

Heeman, Peter A. ; Fryer, Jordan ; Lunsford, Rebecca et al. / Using reinforcement learning for dialogue management policies : Towards Understanding MDP violations and convergence. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. pp. 746-749 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012).

@inproceedings{2347597ed7bb49d9b3fe76156e539780,

title = "Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence",

abstract = "Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence occurs as training progresses. The work in this paper will help dialogue designers build effective policies and understand how much training is necessary.",

author = "Heeman, {Peter A.} and Jordan Fryer and Rebecca Lunsford and Andrew Rueckert and Selfridge, {Ethan O.}",

note = "Copyright: Copyright 2013 Elsevier B.V., All rights reserved.; 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 ; Conference date: 09-09-2012 Through 13-09-2012",

year = "2012",

language = "English (US)",

isbn = "9781622767595",

series = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

pages = "746--749",

booktitle = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

}

TY - GEN

T1 - Using reinforcement learning for dialogue management policies

T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

AU - Heeman, Peter A.

AU - Fryer, Jordan

AU - Lunsford, Rebecca

AU - Rueckert, Andrew

AU - Selfridge, Ethan O.

PY - 2012

Y1 - 2012

N2 - Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence occurs as training progresses. The work in this paper will help dialogue designers build effective policies and understand how much training is necessary.

AB - Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence occurs as training progresses. The work in this paper will help dialogue designers build effective policies and understand how much training is necessary.

UR - http://www.scopus.com/inward/record.url?scp=84878400692&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878400692&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84878400692

SN - 9781622767595

T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

SP - 746

EP - 749

BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

Y2 - 9 September 2012 through 13 September 2012

ER -

Using reinforcement learning for dialogue management policies: Towards Understanding MDP violations and convergence

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this