Skip to content

mkupisie/Clustering-geodemographic_classification_of_NYC_using_K-means_geopandas_sklearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

Clustering: geodemographic classification of NYC using K-means algorithm

Many questions related to spatial observations are complex phenomena that involves several dimensions, what make it hard to summarize them into a single variable. It is especially visible when trying to map distribution of people taking into account e.g. their nationality, education level, age etc. It's uncommon for a geographic region to be exclusively populated by individuals of identical heritage, particularly in the context of urban areas.

Clustering can be used to reduce the dimensionality - the number of variables the analyst needs to look at - and converting it into a more intuitive set of classes. The fundamental concept behind statistical clustering is to condense the information from multiple variables into a relatively small number of categories. Subsequently, each entry in the dataset is assigned exclusively to one category, based on its values for the initially considered variables.**

K-means is one of the most popular clustering algorithm and it can be run in python using sklearn.cluster module in scikit-learn (a popular machine learning library in Python).

Data

For the purpose of classification the data available in pysal library in examples package were used (https://pysal.org/notebooks/lib/libpysal/Example_Datasets.html).

NYC Socio-Demographics data contains the information of total population of the following groups:

  • european: Total Population White
  • asian: Total Population Asian American
  • american: Total Population American Indian
  • african: Total Population African American
  • hispanic: Total Population Hispanic
  • mixed: Total Population Mixed race
  • pacific: Total Population Pacific Islander

Results

1.1. Classification on the map

NYC_results

1.2. The mean of total population for each ethnic group within the class

table_results

Based on the results above it can be noticed that within each group there is a strong majority of one, two and sometimes three ethnic groups:

  • class 0: hispanic and african
  • class 1: european
  • class 2: african
  • class 3: hispanic and european
  • class 4: european and asian
  • class 5: asian
  • class 6: african, european and hispanic
  • class 7: no population
  • class 8: european
  • class 9: african and hispanic

About

Conducting geodemographic classification for ethnic groups in NYC using K-means algorithm available in sklearn.cluster module.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published