Skip to content

Spambase dataset analysis comparing Naïve Bayes classifiers. Evaluated accuracy, confusion matrices on different splits. Explored alternatives for improved performance in ML course, uOttawa 2023.

License

Notifications You must be signed in to change notification settings

RimTouny/Naive-Bayes-Classifiers-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Naive Bayes Classifiers Comparison

Explored alternatives for improved performance in ML course, uOttawa 2023. This repository contains Python code implementing a Spambase Dataset analysis comparing Naïve Bayes classifiers. Evaluated accuracy, confusion matrices on different splits in a Spambase dataset as part of a Machine Learning course project at my study in the University of Ottawa in 2023.

  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • The uploaded code has been executed and tested successfully within the Google Colab environment.

Binary-class classification problem

Task is to classify the email dataset into two classes: Spam / Not Spam.

Independent Variables:

  • 57 Features related to word frequencies, character frequencies, and capital run lengths.

Target variable:

  • 'Target' indicating the classification into two classes.

Key Tasks Undertaken

  1. Dataset Splitting:

    • Divided the dataset into 80% training and 20% test samples, preserving the split for later analysis.
  2. Classifier Evaluation (80/20 Split):

    • Computed confusion matrices and accuracy scores for Gaussian and Multinomial Naïve Bayes classifiers on test data. merge_from_ofoct image

      • Identified that both classifiers didn't predict any spam instances due to unbalanced test data.
  3. Further Evaluation:

    • Employed train-test split function, noting dataset shuffling to avoid zero instances of 'spam' in test data. merge_from_ofoct image

  4. Alternate Classifier Assessment:

    • Explored Bernoulli and Complement Naïve Bayes classifiers, comparing their performance metrics with Gaussian and Multinomial models.

  5. Subset Evaluation:

    • Analyzed four subsets' accuracies, revealing varied performances due to biased training on specific class labels.
  6. Visualization:

    • Presented subset accuracies via a bar chart, highlighting classifier performance variations. image

About

Spambase dataset analysis comparing Naïve Bayes classifiers. Evaluated accuracy, confusion matrices on different splits. Explored alternatives for improved performance in ML course, uOttawa 2023.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published