Skip to content

boniu86/TV_Script_Generation

Repository files navigation

TV_Script_Generation

Table of contents

  1. Libraries used
  2. Project Inspiration
  3. File Descriptions
  4. Data Insights
  5. Licensing, Authors, and Acknowledgements

Libraries used

Python version 3.0. You need to install the following Python modules to complete this project:

Project Inspiration

Imagine working for a production company and your job is as complicated as wirting script everyday, of course it can be this way forever, or there is way to save us. Therefore, here is where this project starts. build a RNN that train on existing scripts and generates new script for us.

How it works? Details can be found in the notebook. In general, we have a exsiting Seinfield scripts (file is in Data) which has over 45k unique words, and over 100k lines, then do all the pre-processing step to get data ready for the RNN. The architect of the RNN is, input data --> embedding layers --> LSTM --> LSTM --> output, more techinical details can be found in the notebook, including model building steps, hyperparameters setting, etc... Eventually, use the trained RNN to generates new scripts, I used 5 starting words and generated 5 script that has 400 words in each one.Some new scripts do not make sense sometimes, but so far it at least looks like original script we have.

File Descriptions

dlnd_tv_script_generation.ipynb : Jupyter notebook containing all the codes and results

dlnd_tv_script_generation.html : HTML form of the notebook

generated_scripts_1-5.txt : new scripts generated by trained RNN, starting words are: jerry, monica, elaine, kramer, george. Notice only lower case, bc RNN is trained on pre-processed words which are all lower case.

file-name.py : all the python files that check our work in notebook, or help us to keep GPU connection stable, etc

Insights

Detail can be found in the notebook. In a word, with train loss less than 3.5 and most of the new scripts make some sense, its a big improvment on speed compared to human manual scripts generation. Mayber we could do more to make the model performs better to generate script.

Licensing, Authors, Acknowledgements

Data : Seinfiled scripts can be found in the Data file.

Notebook: here