Taming Continuous Posteriors for Latent Variational Dialogue Policies
01 April 2021 · 1 min read
Latent action reinforcement learning for task-oriented dialogue has seen success at benchmarks such as MultiWOZ. Categorical latents have been argued to be the best choice. We show that with continuous latents and reformulation of the ELBO objective and the reinforcmenet learning stage, we can achieve state-of-the-art performance on MultiWOZ.