Name	Name	Last commit message	Last commit date
parent directory ..
reinforce	reinforce
simulator	simulator
supervised	supervised
README.md	README.md

Learning through Dialog Interactions

This project contains code for the dialog-based learning MemN2N setup in the following paper: "Learning through Dialogue Interactions".

Setup

This code requires Torch7 and its luarocks packages cutorch, cunn, nngraph, torchx, and tds.

Simulator

The Simulator described in the paper that simulates dialogues between a teacher and a student. This transforms movieQA data to the version used to train the supervised and reinforcement learning models.

to run the simulator:

th runSimulator.lua [params]

Available options are:

-mode           (default train, taking values of train|dev|test)
-task           (default 1, the task you want to simulate, taking values of 1-9)
-prob_correct_final_answer  (default 0.5, the policy that controls the probability that a student gives a correct final answer)
-prob_correct_intermediate_answer   (default 0.5, the policy that controls the probability that a student asks a correct question)
-homefolder     (default ../data/movieQA_kb, the folder to load the movieQA database)
-junk           (default true, whether to incorporate random question-answer pairs in the conversation)
-randomQuestionNumTotal (default 5, the number of random question-answer pairs in the conversation)
-setting        (default AQ, taking values of AQ|QA|mix, AQ denotes the setting in which the student always asks a question, QA denotes the setting in which the student never asks a question, mix means the student asks questions for 50 percent of the time)
-output_dir     (default "./", the output folder path)

Supervised

The offline supervised learning settings described in the paper

to run the trainer:

th train_supervised.lua [params]

Available options are:

-batch_size		(default 32, the batch size for model training)
-init_weight	(default 0.1, initialization weights)
-N_hop			(default 3, number of hops)
-lr				(default 0.05, learning rate)
-thres			(default 40, threshold for gradient clipping)
-gpu_index		(default 1, which GPU to use)
-task           (default 1, which task to run)
-trainSetting   (default AQ, taking values of QA|AQ|mix, which training setting to run. AQ means the student is trained on the dataset which allows asking questions, QA means not allowing asking questions, mix means allowing asking questions for 50 percent of time)
-testSetting    (default AQ, taking values of QA|AQ|mix, which setting to test on. AQ means the student is allowed to ask questions at test time. QA means the student is not allowed to ask questions at test time. mix means the student is allowed to ask questions for 50 percent of time)
-homefolder     (default ../data/movieQA_kb, the folder to load the movieQA database)
-datafolder     (default ../data/AQ_supervised_data, the folder to load train/dev/test data)
-context        (default true, whether to use vanilla MemN2N or context-based MemN2N)
-context_num    (default 1, the number of left/right neighbors to be considered as contexts)

Reinforce

The online reinforcement learning settings described in the paper.

to run the trainer:

th train_RL.lua [params]

Available options are:

-batch_size		(default 32, the batch size for model training)
-token_size		(default 0, number of tokens)
-init_weight	(default 0.1, initialization weights)
-N_hop			(default 3, number of hops)
-lr				(default 0.05, learning rate)
-thres			(default 40, threshold for gradient clipping)
-N_iter         (default 14, the number of total iterations to run)
-StaringFullTraining    (default 10, the number of iterations after which to start training AskQuestion vs notAskQuestion policy)
-gpu_index		(default 1, which GPU to use)
-task			(default 1, which task to test)
-REINFORCE      (default false, where to train the REINFORCE algorithm)
-REINFORCE_reg  (default 0.1, entropy regularizer for the REINFORCE algorithm)
-dic_file       (default ../data/movieQA_kb/movieQA.dict, the dictionary)
-AQcost         (default 0.2, the cost of asking a question)
-RL_setting     (default good, taking values of good|bad|medium, which type of student to test on)
-RF_lr          (default 0.0005, learning rate used by the REINFORCE baseline)
-readFolder     (default ../data/AQ_reinforce_data, the folder to read data from)
-output_file    (default output.txt, the output file)
-context        (default true, whether to use vanilla MemN2N or context-based MemN2N)
-context_num    (default 1, the number of left/right neighbors to be considered as contexts)

References

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato and Jason Weston, "Learning through Dialogue Interactions", arXiv:1612.04936 [cs.CL].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AskingQuestions

AskingQuestions

reinforce

reinforce

simulator

simulator

supervised

supervised

README.md

README.md

README.md

Learning through Dialog Interactions

Setup

Simulator

Supervised

Reinforce

References

Files

AskingQuestions

Directory actions

More options

Directory actions

More options

Latest commit

History

AskingQuestions

Folders and files

parent directory

Learning through Dialog Interactions

Setup

Simulator

Supervised

Reinforce

References