Skip to content

thealexsamexe/Iris-dataset-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Type Type Status

Install

This project requires at least Python 3.1 and the following Python libraries installed:

Data

The dataset used in this project is included as iris.csv. This dataset is a freely available on the UCI Machine Learning Repository. This dataset has the following attributes:

Features

Features: sepal-length, sepal-width, petal-length, petal-width

Target Variable

Target: class

Data Visualization

For univariate plot, a box and whisker plot and a histogram was plotted.

The preliminary results were obtained via plotting the dataset on Box and Whisker plot.

alt text

To obtain the frequency of the range of different features based on numerical data, a histogram was plotted and results were visualized in it.

alt text

Furthermore, to understand correlate the histogram with the data more, a scatter matrix was plotted.

alt text

Violin plot was used for checking the comparison of variable distribution between features

alt text

alt text

alt text

alt_text

For multivariate plot, a pair plot was obtained to understand the best set of features to explain a relationship between two or more features so that choosing a Machine Learning Algorithm can become easier and data analysis can be satisfactory.

alt text

However, the problem faced was the visualization of the correlation numerics which was solved by plotting a heatmap.

alt text

Algorithmic Evaluation

Before going on to this step, the train_test_split() was applied, resulting in the data being splitted into 70% for training and 30% for testing. So, in this project, the rule of thumb was 70-30.

This step was approached by testing the six types of Machine Learning Algorithms such as Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Gaussian Naive Bayes (NB) and Support Vector Machines (SVM).

The obtained evaluation data for the testing data were as follows by the 6 Machine Learning algorithms: LR: 0.934545 (0.071789) LDA: 0.971818 (0.043112) KNN: 0.952727 (0.062430) CART: 0.953636 (0.046435) NB: 0.935455 (0.058698) SVM: 0.980909 (0.038236)

To make things easier to understand, a box and whisker plot was constructed to get the big picture between performance of the six different algorithms.

alt text

Evaluation of Predictions

After testing the model and measuring the accuracy score, the accuracy score was measured as 97.7% or 98% if rounded-off. However, the classification report measured the macro and micro average of precision, f1-score and recall to be 98%.

About

This project is based on Iris Classification using Support Vector Machine (SVM) algorithm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published