Skip to content

Showcasing Manifold Learning with ISOMAP, and compare the model to other transformations, such as PCA and MDS.

Notifications You must be signed in to change notification settings

majdjamal/manifold_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manifold Learning

Overview

High-dimensional data can lie on low-dimensional manifolds. This project creates and compares three dimensionality reduction algorithms, namely PCA, MDS, and isomap. These algorithms are tested with two experiments. The first experiment demonstrates how each embedder captures the manifold of a swiss roll. The second experiment applies isomap to a real-life scenario using a dataset of animals.

Swiss Roll

This experiment will use a swiss roll to investigate how each dimensionality reduction algorithm can capture manifolds. The swiss roll is shown in figure 1, and stored in "swiss_roll.npy."

(Figure 1. A swiss roll visualized in 3D. This data is used when comparing the dimensionality reduction algorithms.)

PCA

Principal Component Analysis is a powerful dimensionality reduction algorithm. However, it does not capture manifolds. Figure 2a demonstrates the result when PCA is used with the swiss roll.

MDS

Multidimensional Scaling takes us one step closer to our goal. This model captures the pattern of a manifold, as seen in Figure 2b.

isomap

Isomap is the optimal solution to capture manifolds. Figure 2c demonstrates the result of using isomap with the swiss roll.

(Figure 2. Swiss roll used with (A) PCA, (B) MDS, and (C) Isomap.)

Animals

Data

Data of animals are retrieved from the UCI machine learning repository. 1 It consists of 101 instances and 17 attributes.

Preprocessing

The 14:th attributes indicate the number of legs. It stores a set of integers, [0, 2, 4, 5, 6, 8]. This is different from the other attributes which store booleans. Attribute 14 is converted from numerical to boolean values by storing True for 2 and 4, and False for the other values. The intuition is that most land animals have either 2 or 4 legs, and it would be convenient to split on these values.

Result

Isomap is used with the Zoo-data, and the result is found in figure 3.

(Figure 3. Isomap is used with the Zoo-dataset. Images of animals are added to make the plot more interpretable.)

Discussion & Conclusion

Land animals such as gorillas and lions were placed to the left in the 2D-plane. Animals that live in or close to the water were placed in the center. For example, we see frogs at origo. Moving upwards from the origo, we start to see penguins and flamingos. Moving downwards, we start to see fishes such as tuna and dolphins. Furthermore, insects were placed to the right in the 2D-plane.

Isomap is a powerful dimensionality reduction algorithm that is good at capturing manifolds, and the experiments confirm this.

Requirements

This project requires packages: NumPy, SciPy, and Matplotlib.

Testing

To test the model, install the required packages, navigate to the repository in your terminal, and type:

python experiment.py