Swahili Sentiment Classifier

This project focuses on sentiment classification, which involves predicting the sentiment or emotion associated with a given text. The goal is to train a machine learning model to accurately classify text into positive, negative, or neutral sentiments.

Project Overview

The sentiment classification project consists of the following major steps:

Data Preprocessing:
- Loading and inspecting the dataset
- Cleaning the text by removing special characters, punctuation, URLs, and HTML tags
- Tokenizing the text and converting it to sequences
- Padding the sequences to ensure uniform length
- Splitting the dataset into training and testing sets
- One-hot encoding the sentiment labels
Model Creation:
- Designing and building a deep learning model using a sequential architecture
- Adding layers such as embedding, LSTM, and dense layers to the model
- Compiling the model with appropriate loss function, optimizer, and metrics
Model Training:
- Training the model on the preprocessed training data
- Monitoring the training progress and optimizing hyperparameters
- Evaluating the model's performance on the validation set, if applicable
Model Evaluation:
- Evaluating the model's performance on the testing set
- Calculating relevant evaluation metrics such as accuracy, precision, recall, and F1-score
- Analyzing the results and gaining insights into the model's strengths and weaknesses

Usage

To use this project:

Clone the repository:

    git clone https://github.com/eddiegulay/Swahili-Sement-Classification.git

Install the required dependencies:

    pip install -r requirements.txt

Run the preprocessing script to clean and preprocess the text data:

    python preprocessing.py

Run the training script to train the sentiment classification model:

    python train.py

Model training accuracy for the current model using Neural Tech Swahili Dataset is 74%. Compared to other models trained with IndabaX dataset that's about 48% - 50% accurate

Perform sentiment classification inference:

    python inference.py "Text for sentiment classification"

Replace "Text for sentiment classification" with the actual text you want to classify.

View the predicted sentiment:

The script will display the input text and the predicted sentiment label.

Saved Model and Tokenizer

The trained model and tokenizer files need to be saved in the specified locations for sentiment classification inference using the command-line interface (CLI). Make sure the following files are present in the respective directories:

Model: model/hyper_sarufi_tunned_swahili_sentiment_rating.h5
Tokenizer: tokenizers/hyper_sarufi_tunned_swahili_sentiment_rating.json

Note: Adjust the file paths as per your directory structure.

License

This project is licensed under the MIT License.

Acknowledgments

The sentiment classification project was inspired by indabaX sentiment classification challenge from Zindi.
Special thanks to Neural Tech for their Swahili sentiment dataset dataset.

Contributors

Edgar Gulay

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
hyperparameter_tuning		hyperparameter_tuning
model		model
notebooks		notebooks
results		results
tokenizers		tokenizers
.gitignore		.gitignore
ASRmzizima.ipynb		ASRmzizima.ipynb
README.md		README.md
inference.py		inference.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
train.py		train.py
training.md		training.md

eddiegulay/Swahili-Sement-Classification

Folders and files

Latest commit

History

Repository files navigation

Swahili Sentiment Classifier

Project Overview

Usage

Saved Model and Tokenizer

License

Acknowledgments

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Languages