Skip to content

a-paxton/clustering-tutorial

Repository files navigation

A tutorial on k-means clustering (with more R lessons along the way)

This repository presents a tutorial to implement K-means clustering in R. It is intended to complement the first half of UConn's Social-Ecological-Environmental (SEE) Lab's introduction to machine learning and K-means clustering, led by Shu Jiang (graduate student, Department of Psychological Sciences, University of Connecticut).

I'll also be using this as an opportunity to share a few R programming tips along the way. It may be especially helpful for those who are not familiar with programming with the Tidyverse, a useful cluster of libraries in R.

With many, many thanks to Bradley Boehmke for the K-means Cluster Analysis tutorial on the University of Cincinnati's Business Analytics R Programming Guide, on which this tutorial is modeled.

Requirements

For this tutorial, you will need:

  • R
  • tidyverse library
  • ggplot2 library
  • viridis library
  • cluster library
  • factoextra library

Optional add-ons

If you haven't already, I would strongly recommend installing RStudio, a useful IDE (integrated development environment) for R. It has lots of helpful capabilities that can make your programming experience a bit smoother.

If you're looking for more datasets to start exploring k-means clustering in more depth, check out the datasets available from the Open-Source Psychometrics Project.

Releases

No releases published

Packages

No packages published