UCI Iris Classification

Description

A python script that predicts plant species based on sepal and petal lengths. The species used in this dataset are iris-setosa, iris-versicolor, iris-virginica. This example is part of the University of California - Irvine Machine Learning Repository.

Libraries used in this example include pandas, seaborn, matplotlib, and scikit-learn. The algorithm used is the k-nearest neighbors algorithm.

Analysis

First, we make box and whisker plots to see the range of values for petal and sepal dimensions.

Next, plot histograms of the same data.

These plots give us a good visual for the data. Now use a violin plot to condense it all into two graphs. One violin plot will show petal length and another will show sepal length.

Now, since we were only given one dataset, we have to split it into a training section and testing section. Most of the data will be in the training dataset.

train, test = train_test_split(df, test_size = 0.3)

#take data features and output for training and testing
train_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
train_y = train['species']

test_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
test_y = train['species']

This example uses the K-nearest Neighbors algorithm so use the following script to train and fit the model:

model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_x, train_y)
prediction = model.predict(test_x)
print(metrics.accuracy_score(prediction, test_y))
print(' ')

This returns pretty good results but what would happen if we seperated petal and sepal lengths? To do this, again split the data into a training section and a testing section. The only difference this time is that you will to do it for both petal and sepal lengths.

#split the dataset
petal = df[['petal-length', 'petal-width', 'species']]
sepal = df[['sepal-length', 'sepal-width', 'species']]

#split the data into a training and testing section again

#petals
train_petal, test_petal = train_test_split(petal, test_size = 0.3, random_state = 0)
train_petal_x = train_petal[['petal-length', 'petal-width']]
train_petal_y = train_petal['species']

test_petal_x = test_petal[['petal-length', 'petal-width']]
test_petal_y = test_petal['species']

#sepals
train_sepal, test_sepal = train_test_split(sepal, test_size = 0.3, random_state = 0)
train_sepal_x = train_sepal[['sepal-length', 'sepal-width']]
train_sepal_y = train_sepal['species']

test_sepal_x = test_sepal[['sepal-length', 'sepal-width']]
test_sepal_y = test_sepal['species']

Retrain the model for this new scenario:

print('New training session:')
#petals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_petal_x, train_petal_y)
prediction = model.predict(test_petal_x)
print('Petal prediction: ')
print(metrics.accuracy_score(prediction, test_petal_y))
print(' ')

#sepals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_sepal_x, train_sepal_y)
prediction = model.predict(test_sepal_x)
print('Sepal prediction: ')
print(metrics.accuracy_score(prediction, test_sepal_y))

It can be seen that restricting only to petal length gives a better prediction than sepal length or both.

Acknowledgements

This project was made with guidance from various Kaggle kernels and other tutorials. These include this tutorial on machinelearningmastery.com and this IPython Notebook by I,Coder.

Sources and Helpful Links

https://archive.ics.uci.edu/ml/datasets/iris
https://www.kaggle.com/adityabhat24/iris-data-analysis-and-machine-learning-python
https://www.kaggle.com/uciml/iris/home
https://www.kaggle.com/ash316/ml-from-scratch-with-iris

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
graphs		graphs
.DS_Store		.DS_Store
README.md		README.md
data.csv		data.csv
iris.py		iris.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graphs

graphs

.DS_Store

.DS_Store

README.md

README.md

data.csv

data.csv

iris.py

iris.py

Repository files navigation

UCI Iris Classification

Description

Analysis

Acknowledgements

Sources and Helpful Links

About

Releases

Packages

Languages

hernanrazo/UCI-iris-classification

Folders and files

Latest commit

History

Repository files navigation

UCI Iris Classification

Description

Analysis

Acknowledgements

Sources and Helpful Links

About

Topics

Resources

Stars

Watchers

Forks

Languages