Identifying Fundraising Project Profiles through Clustering

Note: An improved iteration of this project is in progress, set to conclude in June 2024 . The below summary is for the first iteration.

Identifying Fundraising Project Profiles through Clustering

In this descriptive analytics project, I've leveraged kprototypes - an algorithm that incorporates both numerical and cateogrical data - to form clusters representing different profiles of projects.

Note: This project can be run with just the Clustering-Model.py file and the Kickstarter.xlsx file specified via the path therein.

Context

Kickstarter is a crowdfunding platform with a focus on bringing creative projects to life.

As part of a data mining course, I was tasked with defining meaningful profiles from a dataset of over 15,000 projects from 2009 and 2016.

Process

At a high level, these are the steps I took in developing this model. This repo currently has the code for my final model. I hope to upload some of the behind-the-scenes code while developing that model once I've gotten that cleaned up.

Model Selection

Noting that much of my dataset was comprised of categorical features as well as the fact that many of the characteristics that would be useful in a product strategy context were categorical - I took this as an opportunity to try the versatile kprototypes algorithm

Preprocessing

Manually selected a subset of variables based on my assessment of their potential in defining meaningful clusters
Excluded records which had outliers in terms of their fundraising goals
Binned categorical variables with a high number of unique values based on domain knowledge and/or frequencies observed during EDA
Standardized numerical variables

Hyperparameter Tuning

Fit the model with various 'k' values and determined via elbow plot that the optimal number of clusters was 7
Tested varying gamma values but stuck to default value of 0.5 for balanced consideration of numerical and categorical data for insights, despite heavier weighting on numerical values yielding a lower cost

Profiles Discovered

The Homegrown All-stars (Cluster 3)

Cluster 3, comprising only 0.2% of the dataset, contained exclusively successful, spotlighted projects with high backers and funds raised, representing the platform's all-stars.

The Global Stars (Cluster 5)

Cluster 5, also small in size, mirrored Cluster 3 in success and spotlight but differed with greater regional diversity, including about 15% non-US projects

The Dreamers (Cluster 0)

Cluster 0, about 10% of the dataset, was characterized by high fundraising goals but average success rates and backers, indicating projects with ambitions perhaps too high for the platform.

The Marathoners (Cluster 1)

Cluster 1 was notable for longer campaign durations, managing near-average success rates despite lower staff-pick and spotlight rates, suggesting that extended campaigns might have aided in these projects achieving goals.

The Microprojects (Cluster 6)

Cluster 6, about 15% of the dataset, was marked by shorter descriptive blurbs and lower goals but near-average success rates, indicating a potential focus on microprojects.

The Everymen (Cluster 2)

Cluster 2, representing about 26% of the data, was balanced across categories with a high share of Arts projects. It showed near-average success, staff-pick, and spotlight rates, reflecting a well-rounded “every-man” cluster.

The Everymen+ (Cluster 4)

Cluster 4, holding about 30% of the projects, was similar to Cluster 2 but with higher funding targets and more spotlighted and successful projects, positioning it as a kind of “Every Man+” cluster

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Images		Images
Product Strategy Objectives_V2		Product Strategy Objectives_V2
clustering-model-V1		clustering-model-V1
data		data
utils		utils
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Kickstarter_DataDictionary.xlsx		Kickstarter_DataDictionary.xlsx
LICENSE		LICENSE
README.md		README.md
data-cleaning.ipynb		data-cleaning.ipynb

License

aoluwolerotimi/Kickstarter-Project-Clustering

Folders and files

Latest commit

History

Repository files navigation

Identifying Fundraising Project Profiles through Clustering

Context

Process

Model Selection

Preprocessing

Hyperparameter Tuning

Profiles Discovered

The Homegrown All-stars (Cluster 3)

The Global Stars (Cluster 5)

The Dreamers (Cluster 0)

The Marathoners (Cluster 1)

The Microprojects (Cluster 6)

The Everymen (Cluster 2)

The Everymen+ (Cluster 4)

Appendix

Elbow Plot

Numeric Centroids

Categorical Frequency Counts

About

Topics

Resources

License

Stars

Watchers

Forks

Languages