toxic_comments

For Kaggle contest: Jigsaw Toxic Comment Classification

What's here so far?

There is an exploratory analysis kernel in jupyter notebook format found in 'Summary.ipynb'. The summary focuses on correlations between the types of comment labels, missing / weird data, and most common terms in toxic comments.

Classification: What is there and what to do?

Currently there is a binary relevance NB-SVM classifier of toxic comments, a LSTM NN classifier, and an "ensemble" script that averages the estimated probabilities from both. There are some obvious areas that can be improved. Currently, the SVM assumes the labels are independent. You can see in the summary this is silly, but recovering the lost correlation due to the binary relevance assumption is not trivial. Additionally, the LSTM is far from optimal, though it currently performs quite well.

Who am I?

I am a Graduate student and Data Science Initiative affiliate at UC Davis. I have played with some kaggle datasets before and used one competition as a project for a statistics course, but this is my first real structured attempt to compete. Ultimately, I hope to learn a lot and not finish in last place :P.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
_mac		_mac
codestyles		codestyles
inspection		inspection
.gitignore		.gitignore
LICENSE		LICENSE
LSTM_classification.py		LSTM_classification.py
NB_classification.py		NB_classification.py
README.md		README.md
Summary.ipynb		Summary.ipynb
baseRefactoring.xml		baseRefactoring.xml
colors.scheme.xml		colors.scheme.xml
databaseDrivers.xml		databaseDrivers.xml
databaseSettings.xml		databaseSettings.xml
debugger.xml		debugger.xml
diff.xml		diff.xml
editor.codeinsight.xml		editor.codeinsight.xml
editor.xml		editor.xml
ensemble.py		ensemble.py
filetypes.xml		filetypes.xml
github_settings.xml		github_settings.xml
ide.general.xml		ide.general.xml
ignore.xml		ignore.xml
markdown.xml		markdown.xml
ui.lnf.xml		ui.lnf.xml
usageView.xml		usageView.xml
vcs.xml		vcs.xml
webServers.xml		webServers.xml

License

dilloncarlos/toxic_comments

Folders and files

Latest commit

History

Repository files navigation

toxic_comments

What's here so far?

Classification: What is there and what to do?

Who am I?

About

Resources

License

Stars

Watchers

Forks

Languages