Mueller-Model

Goal

Train an RNN on the non-redacted text in the Mueller Report then use that model to attempt to predict what was redacted.

Eager Execution mode was not ideal for this project. I chose it as this was my first full project in TF and Eager Execution seemed easier to grasp than the graph based approach.
- During training, model sizes were picked essentially by picking the largest model that wouldn't result in a ResourceExhausted error.
The overall flow originally came from this TensorFlow tutorial if you'd like more information on text generation.
Below are links to some pretrained versions I put on Google Drive. Feel free to download them as a jumping off point. The network configurations are in the notebook.
- 400 sequence length
- 750 sequence length

Rewrite V2 without Eager Execution to allow for more parallelization of training, longer and bigger training runs and less random memory errors.
Play with preceding characters for predictions.
- I think setting the preceding characters so: (length of the redaction + preceding characters) = (sequence_length the model is trained on) would be good for normalizing the predictors and the test data but its difficult with a low ceiling on the sequence_length
Descend lower...
- Obvious but I could only achieve a loss of ~1 with my current model. This caused the generated text to less realistic than it could have been.
- It may require the "non-Eager" rework in order to expand the model to descend lower.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
redaction_imputation_v1.ipynb		redaction_imputation_v1.ipynb
strippedtext.txt		strippedtext.txt