Skip to content

This project includes sentiment analysis using deep learning with TensorFlow framework, written in Python. It is licensed under the MIT license.

Notifications You must be signed in to change notification settings

eddiegulay/Swahili-Sement-Classification

Repository files navigation

Swahili Sentiment Classifier

This project focuses on sentiment classification, which involves predicting the sentiment or emotion associated with a given text. The goal is to train a machine learning model to accurately classify text into positive, negative, or neutral sentiments.

Image from: Devonyu Credit: Getty Images/iStockphoto

Project Overview

License License Framework Topic Topic Topic

The sentiment classification project consists of the following major steps:

  1. Data Preprocessing:

    • Loading and inspecting the dataset
    • Cleaning the text by removing special characters, punctuation, URLs, and HTML tags
    • Tokenizing the text and converting it to sequences
    • Padding the sequences to ensure uniform length
    • Splitting the dataset into training and testing sets
    • One-hot encoding the sentiment labels
  2. Model Creation:

    • Designing and building a deep learning model using a sequential architecture
    • Adding layers such as embedding, LSTM, and dense layers to the model
    • Compiling the model with appropriate loss function, optimizer, and metrics
  3. Model Training:

    • Training the model on the preprocessed training data
    • Monitoring the training progress and optimizing hyperparameters
    • Evaluating the model's performance on the validation set, if applicable
  4. Model Evaluation:

    • Evaluating the model's performance on the testing set
    • Calculating relevant evaluation metrics such as accuracy, precision, recall, and F1-score
    • Analyzing the results and gaining insights into the model's strengths and weaknesses

Usage

To use this project:

  1. Clone the repository:
    git clone https://github.com/eddiegulay/Swahili-Sement-Classification.git
  1. Install the required dependencies:
    pip install -r requirements.txt
  1. Run the preprocessing script to clean and preprocess the text data:
    python preprocessing.py
  1. Run the training script to train the sentiment classification model:
    python train.py

Model training accuracy for the current model using Neural Tech Swahili Dataset is 74%. Compared to other models trained with IndabaX dataset that's about 48% - 50% accurate

Model Accuracy

  1. Perform sentiment classification inference:
    python inference.py "Text for sentiment classification"

Replace "Text for sentiment classification" with the actual text you want to classify.

  1. View the predicted sentiment:

The script will display the input text and the predicted sentiment label.

Saved Model and Tokenizer

The trained model and tokenizer files need to be saved in the specified locations for sentiment classification inference using the command-line interface (CLI). Make sure the following files are present in the respective directories:

  • Model: model/hyper_sarufi_tunned_swahili_sentiment_rating.h5
  • Tokenizer: tokenizers/hyper_sarufi_tunned_swahili_sentiment_rating.json

Note: Adjust the file paths as per your directory structure.

License

This project is licensed under the MIT License.

Acknowledgments

  • The sentiment classification project was inspired by indabaX sentiment classification challenge from Zindi.
  • Special thanks to Neural Tech for their Swahili sentiment dataset dataset.

Contributors

About

This project includes sentiment analysis using deep learning with TensorFlow framework, written in Python. It is licensed under the MIT license.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published