Unsupervised-ML-Myopia-Clusters

The purpose of the assignment was used to process the raw MYOPIA data to fit the machine learning models. Several clustering algorithms were used to explore if the patients can be placed into distinct groups of patients. This would help us to analyze them separately and to find better ways to predict myopia, or nearsightedness.

Part 1: Prepare the Data

Used Pandas DataFrame to read myopia.csv.
Removed the "MYOPIC" column from the dataset.
Verified if the data has any "Nulls" or duplicates
Standardize the dataset (using StandardScaler) so that columns that contain larger values do not influence the outcome more than columns with smaller values.

Part 2: Apply Dimensionality Reduction

Performed dimensionality reduction with PCA. This reduced the number of columns from 14 to 10 features.
- preserved 90% of the explained variance in dimensionality reduction.
Further reduced the dataset dimensions with t-SNE.
Created a scatter plot of the t-SNE output. Looks like there are 5 distinct clusters.

Part 3: Perform a Cluster Analysis with K-means

Created an elbow plot to identify the best number of clusters.

Used a for loop to determine the inertia for each k between 1 through 10.
Determined where the elbow of the plot is, and at which value of k it appears.

Used the principal components data with the K-means algorithm with a `K value of 5 & 6`

`With K value of 5:`

`With K value of 6:`

Part 4: Make a Recommendation

The elbow curve and the 3D scatter plots show that the patients can be grouped into 5 or 6 clusters. I would recommend to group the patients into 5 clusters because the elbow curve is more flat after 5 and we may be over fitting the data if we group the patients into 6 clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
Resources		Resources
Myopia Clusters.ipynb		Myopia Clusters.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Images

Images

Resources

Resources

Myopia Clusters.ipynb

Myopia Clusters.ipynb

README.md

README.md

Repository files navigation

Unsupervised-ML-Myopia-Clusters

Part 1: Prepare the Data

Part 2: Apply Dimensionality Reduction

Part 3: Perform a Cluster Analysis with K-means

Used the principal components data with the K-means algorithm with a `K value of 5 & 6`

`With K value of 5:`

`With K value of 6:`

Part 4: Make a Recommendation

The elbow curve and the 3D scatter plots show that the patients can be grouped into 5 or 6 clusters. I would recommend to group the patients into 5 clusters because the elbow curve is more flat after 5 and we may be over fitting the data if we group the patients into 6 clusters.

About

Releases

Packages

Languages

IndraNandagopal/Unsupervised-ML-Myopia-Clusters

Folders and files

Latest commit

History

Repository files navigation

Unsupervised-ML-Myopia-Clusters

Part 1: Prepare the Data

Part 2: Apply Dimensionality Reduction

Part 3: Perform a Cluster Analysis with K-means

Used the principal components data with the K-means algorithm with a K value of 5 & 6

With K value of 5:

With K value of 6:

Part 4: Make a Recommendation

The elbow curve and the 3D scatter plots show that the patients can be grouped into 5 or 6 clusters. I would recommend to group the patients into 5 clusters because the elbow curve is more flat after 5 and we may be over fitting the data if we group the patients into 6 clusters.

About

Topics

Resources

Stars

Watchers

Forks

Languages

Used the principal components data with the K-means algorithm with a `K value of 5 & 6`

`With K value of 5:`

`With K value of 6:`