Skip to content
This repository has been archived by the owner on Dec 15, 2019. It is now read-only.

A python script that classifies iris flower species based on their various dimensions.

Notifications You must be signed in to change notification settings

hernanrazo/UCI-iris-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UCI Iris Classification

Description

A python script that predicts plant species based on sepal and petal lengths. The species used in this dataset are iris-setosa, iris-versicolor, iris-virginica. This example is part of the University of California - Irvine Machine Learning Repository.

Libraries used in this example include pandas, seaborn, matplotlib, and scikit-learn. The algorithm used is the k-nearest neighbors algorithm.

Analysis

First, we make box and whisker plots to see the range of values for petal and sepal dimensions.

petalLengthBW

petalWidthBW

sepalLengthBW

sepalWidthBW

Next, plot histograms of the same data.

petalLengthHist

petalWidthHist

sepalLengthHist

sepalWidthHist

These plots give us a good visual for the data. Now use a violin plot to condense it all into two graphs. One violin plot will show petal length and another will show sepal length.

petalLengthViolin

sepalLengthViolin

Now, since we were only given one dataset, we have to split it into a training section and testing section. Most of the data will be in the training dataset.

train, test = train_test_split(df, test_size = 0.3)

#take data features and output for training and testing
train_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
train_y = train['species']

test_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
test_y = train['species']

This example uses the K-nearest Neighbors algorithm so use the following script to train and fit the model:

model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_x, train_y)
prediction = model.predict(test_x)
print(metrics.accuracy_score(prediction, test_y))
print(' ')

This returns pretty good results but what would happen if we seperated petal and sepal lengths? To do this, again split the data into a training section and a testing section. The only difference this time is that you will to do it for both petal and sepal lengths.

#split the dataset
petal = df[['petal-length', 'petal-width', 'species']]
sepal = df[['sepal-length', 'sepal-width', 'species']]

#split the data into a training and testing section again

#petals
train_petal, test_petal = train_test_split(petal, test_size = 0.3, random_state = 0)
train_petal_x = train_petal[['petal-length', 'petal-width']]
train_petal_y = train_petal['species']

test_petal_x = test_petal[['petal-length', 'petal-width']]
test_petal_y = test_petal['species']

#sepals
train_sepal, test_sepal = train_test_split(sepal, test_size = 0.3, random_state = 0)
train_sepal_x = train_sepal[['sepal-length', 'sepal-width']]
train_sepal_y = train_sepal['species']

test_sepal_x = test_sepal[['sepal-length', 'sepal-width']]
test_sepal_y = test_sepal['species']

Retrain the model for this new scenario:

print('New training session:')
#petals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_petal_x, train_petal_y)
prediction = model.predict(test_petal_x)
print('Petal prediction: ')
print(metrics.accuracy_score(prediction, test_petal_y))
print(' ')

#sepals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_sepal_x, train_sepal_y)
prediction = model.predict(test_sepal_x)
print('Sepal prediction: ')
print(metrics.accuracy_score(prediction, test_sepal_y))

It can be seen that restricting only to petal length gives a better prediction than sepal length or both.

Acknowledgements

This project was made with guidance from various Kaggle kernels and other tutorials. These include this tutorial on machinelearningmastery.com and this IPython Notebook by I,Coder.

Sources and Helpful Links

https://archive.ics.uci.edu/ml/datasets/iris
https://www.kaggle.com/adityabhat24/iris-data-analysis-and-machine-learning-python
https://www.kaggle.com/uciml/iris/home
https://www.kaggle.com/ash316/ml-from-scratch-with-iris

About

A python script that classifies iris flower species based on their various dimensions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages