/
todo.txt
35 lines (30 loc) · 1.57 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
TODO
====
Continue article development
* Develop: find conversations with clearest effects
* For all conversations and a list of detectors, see if there are results missing... if so create them
* For all conversations and a list of detectors, calculate the F-score at 3 points in the conversation and store with tabs between them
* Developer: visualize neurons
* I'm hoping for a casing detector
* I'm hoping for similar words in one detector
* visualize_neuron_specializations: alle soorten modellen accepteren
* https://twitter.com/realdonaldtrump/status/640131477179645952
* https://twitter.com/realdonaldtrump/status/630901938608017413
Code
* Fix type hinting
* Visualize module: too much duplicate code
* Add save option for all visualization functions
Ideas to make the model better
==============================
Use toxic embeddings (make sure to not mix up train and test data!)
Separate casing
Do sequences really need to have a fixed length?
Backlog
=======
* Experiment with neural nets:
* Study best model in Kaggle... what is different from this experiment? https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52557
* Check: how is it possible that with less data, scores are higher?
* Combine embeddings
* See if this best result can be improved by using Ulmfit http://nlp.fast.ai/ or Bert https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
* https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/
* Look up: what exactly is ConvoKit?