Deep-Survey-on-Text-Classification

This is a survey on deep learning models for text classification and will be updated frequently with testing and evaluation on different datasets.

Natural Language Processing tasks ( part-of-speech tagging, chunking, named entity recognition, text classification, etc .) has gone through tremendous amount of research over decades. Text Classification has been the most competed NLP task in kaggle and other similar competitions. Count based models are being phased out with new deep learning models emerging almost every month. This project is an attempt to survey most of the neural based models for text classification task. Models selected, based on CNN and RNN, are explained with code (keras and tensorflow) and block diagrams. The models are evaluated on one of the kaggle competition medical dataset.

Update: Non stop training and power issues in my geographic location burned my motherboard. By the time i had to do 2 RMAs with ASROCK and get the system up and running, the competition was over :( but still i learned a lot.

Project setup

Download and install anaconda3 say at ~/Programs/anaconda3
create a virtual environment using cd ~/Programs/anaconda3 && mkdir envs and cd envs && ../bin/conda create -p ~/Programs/anaconda3/envs/dsotc-c3 python=3.6 anaconda.
Do activate the environment source /home/bicepjai/Programs/anaconda3/envs/dsotc-c3/bin/activate dsotc-c3
Install ~/Programs/anaconda3/envs/dsotc-c3/bin/pip using conda install pip (anaconda has issues with using pip so use the fill path)
Execute command pip install -r requirements.txt for installing all dependencies
For enabling jupyter extensions jupyter nbextensions_configurator enable --user
For enabling configuration options jupyter contrib nbextension install --user
Some extensions to enable Collapsible Headings, ExecuteTime, Table of Contents

Now we should be ready to run this project and perform reproducible research. The details regarding the machine used for training can be found here

Version Reference on some important packages used

Keras==2.0.8
tensorflow-gpu==1.3.0
tensorflow-tensorboard==0.1.8

Data

Details regarding the data used can be found here

Content

This project is completed and the documentation can be found here. The papers explored in this project

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
count_based_models		count_based_models
data_prep		data_prep
deep_models		deep_models
images		images
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
capstone.pdf		capstone.pdf
proposal.pdf		proposal.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count_based_models

count_based_models

data_prep

data_prep

deep_models

deep_models

images

images

lib

lib

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

capstone.pdf

capstone.pdf

proposal.pdf

proposal.pdf

requirements.txt

requirements.txt

Repository files navigation

Deep-Survey-on-Text-Classification

Project setup

Data

Content

About

Releases

Packages

Languages

License

bicepjai/Deep-Survey-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Deep-Survey-on-Text-Classification

Project setup

Data

Content

About

Topics

Resources

License

Stars

Watchers

Forks

Languages