PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A NONPARAMETRIC BAYESIAN APPROACH TO LEARNING MULTIMODAL INTERACTION MANAGEMENT
Zhuoran Wang and Oliver Lemon
In: The Fourth IEEE Workshop on Spoken Language Technology (SLT 2012), 2-5 Dec 2012, Miami, Florida, USA.

Abstract

Managing multimodal interactions between humans and computer systems requires a combination of state estimation based on multiple observation streams, and optimisation of time-dependent action selection. Previous work using partially observable Markov decision processes (POMDPs) for multimodal interaction has focused on simple turn-based systems. However, state persistence and implicit state transitions are frequent in real-world multimodal interactions. These phenomena cannot be fully modelled using turn-based systems, where the timing of system actions is a non-trivial issue. In addition, in prior work the POMDP parameterisation has been either hand-coded or learned from labelled data, which requires significant domain-specific knowledge and is labor-consuming. We therefore propose a nonparametric Bayesian method to automatically infer the (distributional) representations of POMDP states for multimodal interactive systems, without using any domain knowledge. We develop an extended version of the infinite POMDP method, to better address state persistence, implicit transition, and timing issues observed in real data. The main contribution is a “sticky” infinite POMDP model that is biased towards self- transitions. The performance of the proposed unsupervised approach is evaluated based on both artificially synthesised data and a manually transcribed and annotated human-human interaction corpus. We show statistically significant improvements (e.g. in ability of the planner to recall human bartender actions) over a supervised POMDP method.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:User Modelling for Computer Human Interaction
Learning/Statistics & Optimisation
Multimodal Integration
ID Code:9582
Deposited By:Zhuoran Wang
Deposited On:11 October 2012