Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 2.66 KB

File metadata and controls

45 lines (31 loc) · 2.66 KB

Clustering: geodemographic classification of NYC using K-means algorithm

Many questions related to spatial observations are complex phenomena that involves several dimensions, what make it hard to summarize them into a single variable. It is especially visible when trying to map distribution of people taking into account e.g. their nationality, education level, age etc. It's uncommon for a geographic region to be exclusively populated by individuals of identical heritage, particularly in the context of urban areas.

Clustering can be used to reduce the dimensionality - the number of variables the analyst needs to look at - and converting it into a more intuitive set of classes. The fundamental concept behind statistical clustering is to condense the information from multiple variables into a relatively small number of categories. Subsequently, each entry in the dataset is assigned exclusively to one category, based on its values for the initially considered variables.**

K-means is one of the most popular clustering algorithm and it can be run in python using sklearn.cluster module in scikit-learn (a popular machine learning library in Python).

Data

For the purpose of classification the data available in pysal library in examples package were used (https://pysal.org/notebooks/lib/libpysal/Example_Datasets.html).

NYC Socio-Demographics data contains the information of total population of the following groups:

  • european: Total Population White
  • asian: Total Population Asian American
  • american: Total Population American Indian
  • african: Total Population African American
  • hispanic: Total Population Hispanic
  • mixed: Total Population Mixed race
  • pacific: Total Population Pacific Islander

Results

1.1. Classification on the map

NYC_results

1.2. The mean of total population for each ethnic group within the class

table_results

Based on the results above it can be noticed that within each group there is a strong majority of one, two and sometimes three ethnic groups:

  • class 0: hispanic and african
  • class 1: european
  • class 2: african
  • class 3: hispanic and european
  • class 4: european and asian
  • class 5: asian
  • class 6: african, european and hispanic
  • class 7: no population
  • class 8: european
  • class 9: african and hispanic