A fourth-year first semester mini project report submitted for partial fulfilment of the requirements for COMP 484.
Project Supervisor: Dr. Bal Krishna Bal, H.O.D DoCSE K.U.
The chatbot is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods.
- To create a chatbot that can interact and have a conversation with its user.
- To use the language translator’s sequence to sequence model to create an interactive bot.
- To create an online bot that could be used as a conversational program in online game streaming websites.
For the dataset used in the model, we used the readily available dump data of 1.7 Billion Reddit Comments. We trimmed down the data to only a month of data of Jan 2015.
1.7 Billion Reddit Comments Dataset
We filtered down the reddit comments from two parameters:
- their score(i.e. only accepted the comments with a score >= 2)
- comment length (i.e. number of words >1 and < 50)
We then stored this data in sqlite database for our model. Sqlite
To train the model, we have used Tensorflow's Seq2Seq NMT Model. Originally, this model was used for text translation from English to French Language. We've tweaked this model to output English Language.
- Our model is under the folder named 'model'.
- All the output files after training the model have been saved in the 'model' folder as 'translate.ckpt' files.
- The training and testing data files are inside the 'new_data' folder.
- The interface for the bot can be accessed through the 'modded-inference.py' file.
Made an interactive bot having the following params:
- Training Steps: 30, 000
- Learning Rate: 0.001
- Words Per Second Learnt: 11.52K
- Perplexity Value: 38.95
- Gradient Noise: 5.90
- Bleu Value: 2.78
Taking Google Translator's Bleu Value as a Reference Value :
- Google Translator's Bleu Value: 3.694
- Our Model's Bleu Value: 2.78
- Accuracy of Our model based on Bleu Value = 75%