top of page

DAWnet

Abstract

The past decade has witnessed the boom of the human-computer interaction, particularly via dialog systems. In this paper, we study the task of response generation in the open-domain multi-turn dialog systems. It is worth mentioning that many efforts have been dedicated to study the dialog systems, yet few of them shed light on deepening or widening the chatting topic in a conversational session, which is the key to retain users to talk more. To achieve this goal, in this paper, we present a novel deep scheme comprising of three channels, namely global, wide and deep ones. The global channel encodes the complete historical information within the given context, the wide one employs attention-based recurrent neural network to predict extensional keywords that may not appear in the historical context, and the deep one trains a Multilayer Perceptron model to select some keywords for an in-depth discussion. Thereafter, our scheme integrates the outputs of these three channels to generate the desired responses. To justify our model, we conducted extensive experiments to compare our model with several state-of-the-art baselines on two datasets: one is constructed by ourselves and the other is a well-known public benchmark dataset. Experimental results demonstrate that our model yields promising performance by widening or deepening the topic of interest.

Data collection

To train DAWnet and evaluate its performance in improving the coherence, informativeness, and diversity of responses, we built a dataset of multi-turn dialogs in the open domain, named Sina Weibo Conversation Corpus. It covers a rich range of topics in our daily conversations. The dialogs are collected from Sina weibo, one of the most popular social media sites in China and used by over 30% of Internet users. The raw data comprises of over 20 million sessions and each session contains many post- response pairs between two people. Thereafter, we selected the sessions which satisfy the following rules: 1) The turns in the session are more than three. 2) The response is meaningful and has two keywords at least. The keywords refer to those with the TF-IDF value no less than the given threshold. Following the selection, we pre-processed the sessions by removing the noisy words and converting the emojis into the corresponding words. Ultimately, we had a dataset comprising of 1,587,119 sessions in total. The average turns and tokens per dialog are 3.71 and 42.17, respectively. Here, we release the data to train and evaluate the model.  The data acquired from here has been processed because of  users' privacy and some legal problems.  

Paper

You can download the paper from here. But it will be updated recently.

Code

The code can be acquired from here.

Requirements:

  • Python 3.5

  • Tensorflow 1.0 ( DAWnet ) 

  • Tensorflow 1.4 ( DAWnet_BS(beam search is  added) ) 

  • numpy 1.14.0

bottom of page