Attention Mechanism based Text Classification on Customer Complaints Data

Business Context

The attention mechanism is a powerful tool that dynamically highlights relevant features of input data. It assigns more weight to the most relevant words in a sentence, improving the quality of predictions. This project delves into the application of the attention mechanism for text classification, evaluating its effectiveness in solving our problem.

Aim

To perform multiclass text classification using the attention mechanism on a dataset of customer complaints about consumer financial products.

Data Description

The dataset consists of more than two million customer complaints about consumer financial products. It includes columns for the actual text of the complaint and the product category associated with each complaint. Pre-trained word vectors from the GloVe dataset (glove.6B) are used to enhance text representation.

Tech Stack

Language: Python
Libraries: pandas, torch, nltk, numpy, pickle, re, tqdm, sklearn

Approach

Installing the necessary packages via pip
Importing the required libraries
Defining configuration file paths
Processing GloVe embeddings:
- Reading the text file
- Converting embeddings to float arrays
- Adding embeddings for padding and unknown items
- Saving embeddings and vocabulary
Processing Text data:
- Reading the CSV file and dropping null values
- Replacing duplicate labels
- Encoding the label column and saving the encoder and encoded labels
Data Preprocessing:
- Conversion to lowercase
- Punctuation removal
- Digits removal
- Removing consecutive instances of 'x'
- Removing additional spaces
- Tokenizing the text
- Saving the tokens
Model:
- Creating the attention model
- Defining a function for the PyTorch dataset
- Functions for training and testing the model
Training:
- Loading the necessary files
- Splitting data into train, test, and validation sets
- Creating PyTorch datasets
- Creating data loaders
- Creating the model object
- Moving the model to GPU if available
- Defining the loss function and optimizer
- Training the model
- Testing the model
Making predictions on new text data

Modular Code Overview

Input: Contains the data required for analysis, including:
- complaints.csv
- glove.6B.50d.txt (download from here)
Source: Contains modularized code for various project steps, including:
- model.py
- data.py
- utils.py
These Python files contain helpful functions used in Engine.py.
Output: Contains files required for model training, including:
- attention.pth
- embeddings.pkl
- label_encoder.pkl
- labels.pkl
- vocabulary.pkl
- tokens.pkl
config.py: Contains project configurations.
Engine.py: The main file to run the entire project, which trains the model and saves it in the output folder.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Output		Output
Source		Source
libs		libs
Engine.py		Engine.py
LICENSE		LICENSE
config.py		config.py
predict.py		predict.py
processing.py		processing.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output

Output

Source

Source

libs

libs

Engine.py

Engine.py

LICENSE

LICENSE

config.py

config.py

predict.py

predict.py

processing.py

processing.py

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

Attention Mechanism based Text Classification on Customer Complaints Data

Business Context

Aim

Data Description

Tech Stack

Approach

Modular Code Overview

About

Releases

Packages

Languages

License

AjNavneet/Attention_TextClassification_CustomerComplaints

Folders and files

Latest commit

History

Repository files navigation

Attention Mechanism based Text Classification on Customer Complaints Data

Business Context

Aim

Data Description

Tech Stack

Approach

Modular Code Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Languages