Latent action reinforcement learning for task-oriented dialogue has seen success at benchmarks such as MultiWOZ. Categorical latents have been argued to be the best choice. We show that with continuous latents and reformulation of the ELBO objective and the reinforcmenet learning stage, we can achieve state-of-the-art performance on MultiWOZ.