Skip to content

This repo compares performance accuracy of a CNN model with three different datasets: CIFAR10 dataset, SeaTurtleIDHeads random split and SeaTurtleIDHeads time aware split.

License

Notifications You must be signed in to change notification settings

Kane-Kesler/Sea-Turtle-Identification-Neural-Nets

Repository files navigation

Sea Turtle Identification Neural Nets

Table of Contents

Research Goals

Our goal is to build a CNN model that can identify future instances of a given sea turtle based on the images the model is trained on. The model will identify individual sea turtles based on the unique segmeneted patterns on their heads. To train and test our model, we use the SeaturtleIDHeads dataset.

sample images of sea turtle heads

Note: Each row represents photos of a unique turtle.

Investigating the effects of tourism and social media on wildlife animals can be done by using CNN models to identify the animals on social media platforms and tracking the frequency at which they are posted (Papafitsoros, Adam, and Schofield 2023).

Methodology

We will first train our model on the well-established CIFAR-10 dataset and measure its performance. We will call this model the CIFAR-10 CNN model. After, we use the same network architecture to train two models using the SeaturtleIDHeads dataset, the first model splits the data randomly in the training and testing datasets (Random-Split model) and the other splits the data after a certain date-time (Time-cutoff Split model). We do this so we can compare how well our Time-cutoff Split model performs when trying to identify future instances of any given sea turtle compared to the Random-Split model.

Data

CIFAR-10

The CIFAR-10 dataset consists of a total of 6,000 32 × 32 images such that they can be classified into 10 categories: dog, cat, deer, frog, horse, bird, plane, car, truck and ship. Since this is a standard dataset used to practice different CNN models, we build our model using this dataset first to gauge performance.

SeaturleIDHeads

From the dataset of size ~8,000, we take a subset of data of size 10 corresponding to the turtles with the most images. We also reduce the size of the images to 32×32 to reduce the computational burden.

Results

CIFAR-10 model: 82.25%

SeaturtleIDHeads Random Split model: 62.01%

SeaturtleIDHeads Time-cutoff Split model: 37.85%

Further analysis of the models are described in the research paper.

Instructions

conda install pytorch pandas numpy seaborn

To get the sea turtle dataframe, go on the SeaturtleIDHeads dataset page and click 'Edit my Copy' to copy the page.

Acknowledgements

Patrick Loeber provided the general structure of the model class as well as the training and testing loops.

GeekAlexis has a repo containing code on how to plot the loss curves and how to print the average training and validation loss each epoch. Implementations such as learning rate scheduler, batch normalisation and dropout were all inspired by this repo.

Konrad Szafer provided the initial code needed to create a custom dataset.

Sahar Millis provided code that plots a confusion matrix.

License

Apache-2.0 license

About

This repo compares performance accuracy of a CNN model with three different datasets: CIFAR10 dataset, SeaTurtleIDHeads random split and SeaTurtleIDHeads time aware split.

Topics

Resources

License

Stars

Watchers

Forks