Spoken Open Domain Dialogue Systems for Non-native Speakers
Open-domain dialog systems have been receiving a great deal of attention from both academia and industry, resulting in many applications. Chatbots like Alexa are designed for information retrieval and chit-chat purposes, while others like Replika are built for companion and emotional support. There is also a growing interest in using open domain chatbots as a language educator as practising conversations is an effective way for non-native speakers to acquire a new language. However, building a chatbot that can interact with non-native speakers is challenging because: (1) Errors propagated from automatic speech recognition (ASR) systems lead to unexpected responses, (2) Non-native speech may contain a lot of disfluencies and grammatical errors, which degrade the chatbot’s performance, (3) Most non-native speakers are not good conversationalists because of their limited linguistic ability, thus chatbots must be highly interactive and engaging, and able to lead and create a meaningful conversation. This PhD will improve upon state-of-the-art open-domain chatbots, making it possible for chatbots to create a smooth and engaging conversation with non-native speakers. In particular, the PhD will focus on: (1) Making open-domain dialog systems robust to noisy inputs (e.g. ASR errors and disfluencies) by using multiple ASR decoding outputs or enabling the system to ask clarifying questions (2) Making open-domain dialog systems more engaging by leveraging a user’s personalized data such as interests, goals, beliefs, and values. This information is necessary for the chatbot to choose the right topic for the conversation, to avoid discussing things that are out of the user’s interests. The chatbot will learn to do that by looking at dialogue examples in which two people have an engaging/meaningful conversation, on condition that they already know about each other. For direction (1), we will modify the transformer-based seq2seq model, allow it to encode not only the dialog history but also additional ASR information at the current turn. The model will learn either to generate an appropriate response or to ask a clarifying question by looking at spoken dialogue examples. For direction (2), we will create a dataset called DeepConversations, consisting of many engaging/meaningful dialogues. We also propose a transformer-based memory network to encodes each of the user’s personalized data as individual memory representations, and then generating the engaging response word-by-word. The outcome of this PHD will benefit not only non-native speakers but also native speakers who using chatbots for entertainment purposes.