todo.txt

TODO
====
Continue article development
* Develop: find conversations with clearest effects
  * For all conversations and a list of detectors, see if there are results missing... if so create them
  * For all conversations and a list of detectors, calculate the F-score at 3 points in the conversation and store with tabs between them
* Developer: visualize neurons
  * I'm hoping for a casing detector
  * I'm hoping for similar words in one detector
  * visualize_neuron_specializations: alle soorten modellen accepteren
    * https://twitter.com/realdonaldtrump/status/640131477179645952
    * https://twitter.com/realdonaldtrump/status/630901938608017413
    
Code
* Fix type hinting
* Visualize module: too much duplicate code
* Add save option for all visualization functions

Ideas to make the model better
==============================
Use toxic embeddings (make sure to not mix up train and test data!)
Separate casing
Do sequences really need to have a fixed length?

Backlog
=======
* Experiment with neural nets:
  * Study best model in Kaggle... what is different from this experiment? https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52557
    * Check: how is it possible that with less data, scores are higher?
    * Combine embeddings

  * See if this best result can be improved by using Ulmfit http://nlp.fast.ai/ or Bert https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
    * https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/

* Look up: what exactly is ConvoKit?