Monitoring Human Engagement in a Situated Dialogue
Currently, many AI dialogue assistants such as Siri, Xiaodu smart speakers, or Alexa have become very popular. However despite their initial popularity, many users rely on them only to set timers or play particular songs etc. Achieving true conversational interaction will require considerably more research.
One metric that will be very important to maximise in order to achieve true conversational interaction is engagement. In the context of dialogue systems, engagement can be thought of as an estimate of just how much the user is interacting with a dialogue system, and importantly, a measure of whether they are enjoying the interaction and are likely to continue into the future. Without having strong engagement, system and users cannot maintain a long history connection. In order to enhance engagement, we must monitor the physical and audio signs of engagement, and also fine tune the language produced by the system in order to maximise the user engagement and adjust strategy as appropriate.
In this PhD project, we will focus on the issue of engagement in dialogue and model how it can be monitored, and how dialogue policies can be adjusted to account for situated engagement. For modelling we will look at a combination of visual and audio / content monitoring to estimate engagement levels. The visual aspects might for example include facial thermal images (applying noir camera filters), facial expression or body temperature and blood pressure. Audio and content elements will focus on the amount of pauses etc in speech as well as the sentiment of content. In the second part of this work we will look at how the dialogue production policy can be adjusted in task oriented dialogues to maximise engagement and fine tune to individual users. One potential model that we will investigate for this purpose is Hierarchical Deep Reinforcement Learning. Deep Reinforcement Learning has in general been found to be very useful in planning dialogue strategy. A hierarchical variant of it has to potential to sperate the different levels of language production so that the policies can come more potential variations without becoming computationally too costly.
Empirical studies and computational modelling will be balanced through the entire PhD project. Data collection and models validation will be executed in empirical studies. As for computational modelling, it is a learning process of Hierarchical Reinforcement learning by a study of the current state-of-the-art methods in the end-to-end neural dialogue processes.