Conversation seq2seq with Keras and Cornell Movie Dialog Dataset

Goal

The goal of this repo is to demonstrate creating a real seq2seq model in Keras, and evaluating it's results. There are a lot of incorrect and incomplete seq2seq implementations out there, and I was unable to find a reference implementation in Keras with actual results against an open dataset discuessed anywhere.

This model does not implement attention, though thanks to the correct implementation of seq2seq here, it would not be difficult to add.

Data

Cornell Movie Dialog Dataset

Size:

304,713 lines of dialog
9,035 characters
616 movies
24 categories

Each line of movie_lines.txt has the line ID, character ID, movie ID, character name, and the line of dialog.

Usage

First, download the dataset linked above, and symlink or copy it to ./data/.

Second, generate development and heldout data from dataset:

$ pip3 install -r requirements.txt --user # if necessary
$ # We should see the following results
$ ls data
chameleons.pdf                 movie_conversations.txt  movie_titles_metadata.txt  README.txt
movie_characters_metadata.txt  movie_lines.txt          raw_script_urls.txt
$ # Run the data prep to create develop and heldout split
$ PYTHONPATH=$(pwd) python3 main.py prep
Loading movie, character, and conversation data...
Splitting data into develop and heldout data based on movie...
75 of 617 movies chosen for heldout...
Writing development data to data/develop/...
Writing heldout data to data/heldout/...
Done with prep!
$ ls data
chameleons.pdf  heldout                        movie_conversations.txt  movie_titles_metadata.txt  README.txt
develop         movie_characters_metadata.txt  movie_lines.txt          raw_script_urls.txt
$ PYTHONPATH=$(pwd) python3 main.py train
... lots of training stuff...

Results

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
prep.py		prep.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

main.py

main.py

prep.py

prep.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

Conversation seq2seq with Keras and Cornell Movie Dialog Dataset

Goal

Data

Usage

Results

About

Releases

Packages

Languages

soaxelbrooke/movie-seq2seq

Folders and files

Latest commit

History

Repository files navigation

Conversation seq2seq with Keras and Cornell Movie Dialog Dataset

Goal

Data

Usage

Results

About

Resources

Stars

Watchers

Forks

Languages