A Recurrent Neural Pipeline for Multi-Class | Multi-Label Text Classification

This repository walks you through an end-to-end flow of training Sequence Models (RNNs) along with all the tips/tricks/pointers of which a developer should be aware.

Here, we explain how to frame/handle a Multi-Class | Multi-Label Text Classification problem statement along with it's data preparation pipeline.

Quick Colab Setup

Follow the below steps to setup a Google Colab Workspace and run your experiments :

The Github Repo is: https://github.com/amitbcp/icdmai_2020
The setup notebook in the repo is: 0_setup.ipynb
Kindly use the link below and select the 0_setup.ipynb file to be opened in your personal Google account
Link: https://colab.research.google.com/github/amitbcp/icdmai_2020/blob/master/
Run the Cells in the notebook after connecting to a run-time.
i. The first cell requests access to your Grdrive to create the appropriate folders/files.
ii.Please authorize and allow access to the notebook from your google account by copying the verification code that appears.
iii. The second cell would install the required packages. This should set up your workspace.
iv. Run the cells to download the data soures required.
From here follow the notebooks in the numbered ordered.
Verify your Drive has a folder ICDMAI_Tutorial/notebook. This should have a couple of notebooks.

Quick Start

Want to play with these notebooks online without having to install anything?

Use any of the following services.

WARNING: Please be aware that these services provide temporary environments: anything you do will be deleted after a while, so make sure you download any data you care about.

Recommended: open this repository in Colaboratory:

Just want to quickly look at some notebooks, without executing any code?

Browse this repository using jupyter.org's notebook viewer:

Note: github.com's notebook viewer also works but it is slower and the math equations are not always displayed correctly.

Problem Statement

Given a post/question from Stackoverflow, predict the Technology Domain & Associated Tags for it. We are working with a only 14 Technology Domain & 112 Tags . Less than 10% data from Stackoverflow questions are used for demonstration purpose only.

The Technology domains are :

Programming
MS-Development Environment
Server Side Development
Mobile App Development
Dev Environment
Front-end/Designing
Dynamic UI
MVC
Dev Ops
Big Data
QA
Project Management
Scripting
Business Analytics

Notebooks

Setup Notebook : Notebook To setup & try-out experiments.
Checklist & Flow for NLP Problems : Best Case Practice while solving Deep Learning problems related to NLP.

Exploratory Data Analysis : Explore the relationships within different groups and labels with the text.
Classical ML Approach : Demonstrate Naive Baise, SVM & Logistic Regression for baseline modeling.
Data Preparation : Creating dataset for RNNs
a. Standard : Creating group level dataset without handling biasness.
b. Normalise : Creating group level dataset after clipping data to normalise the distribution.
Word Embedding : Training Custom word-embeddings for our corpus
Model Training : Prototyping & Training LSTMs for Text Classification
Inference Pipeline : Ensembling models for prediction
Visualise Results : Plotting Loss curves & perfomance metrics
Outline : Handling RNN pipelines end-to-end.
Proposal : Proposal to ICDMAI 2020 committee.
Presentation : Presentation used for the session.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.gitignore		.gitignore
0_setup.ipynb		0_setup.ipynb
1_eda.ipynb		1_eda.ipynb
2_classical_ml_approach.ipynb		2_classical_ml_approach.ipynb
3a_standard_data_preparation.ipynb		3a_standard_data_preparation.ipynb
3b_normalise_data_preparation.ipynb		3b_normalise_data_preparation.ipynb
4_word_embedding.ipynb		4_word_embedding.ipynb
5_model_training.ipynb		5_model_training.ipynb
6_inference_pipeline.ipynb		6_inference_pipeline.ipynb
7_visualize_results.ipynb		7_visualize_results.ipynb
8_evaluation_script.ipynb		8_evaluation_script.ipynb
Document-Term-Matrix-with-Title-1.png		Document-Term-Matrix-with-Title-1.png
ICDMAI Flow.docx		ICDMAI Flow.docx
LICENSE		LICENSE
README.md		README.md
experiments.xlsx		experiments.xlsx
experiments_template.xlsx		experiments_template.xlsx
icdmai_v2.pptx		icdmai_v2.pptx
proposal_icdmai_2020.pdf		proposal_icdmai_2020.pdf

License

amitbcp/icdmai_2020

Folders and files

Latest commit

History

Repository files navigation

A Recurrent Neural Pipeline for Multi-Class | Multi-Label Text Classification

Quick Colab Setup

Quick Start

Want to play with these notebooks online without having to install anything?

Just want to quickly look at some notebooks, without executing any code?

Problem Statement

Notebooks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages