Skip to content

Use MRT hourly data to run Kmeans analysis based on entrance and exit people for each station

Notifications You must be signed in to change notification settings

ShihWen/MRT_Kmeans

Repository files navigation

Grouping MRT Stations using K-means Clustering

Use MRT hourly data to run Kmeans analysis based on entrance and exit number of people for each station
You can also read the article at Medium for reference.

Continue on the project MRT_Cleaning_Visualizing, here the data will be reshaped and divied into entrance and exit for each station, and run K-means clustering in order to explore deeper relation between stations in terms of people flow.

Below is the process flow:

Steps

  1. Refine the data using functions in MRT_cleaning_visualizing.ipynb, which will be used as raw data for K-means,
    where the data for each stations has been transfered from 7*21 table to a list and normalized.
  2. Run K-means analysis with those hourly number of people as variables by MRT_K-means_Analysis.ipynb, which will generate:
    • k_pack_IN/OUT csv file for raw data of entrance and exit separately
    • cluster_group_IN/OUT csv file showing which stations belong to which cluster group
    • line graph and heat map grouped by cluster
    • df_cluster.csv showing the entrance and exit group for each station
  3. MRT_K-means_Analysis.ipynb visualize the result in terms of clusters by heat map and line graph:
Entrance Cluster 0 Entrance Cluster 1 Entrance Cluster 2 Entrance Cluster 3
Exit Cluster 0 Exit Cluster 1 Exit Cluster 2 Exit Cluster 3

Line graph for entance: Line graph for exit:

  1. Visualize the result in terms of single station via MRT_K-means_StationDashboard.ipynb, which generates dashboard for all stations under the folder created by MRT_K-means_Analysis.ipynb:

  2. Geo-visualize the result using QGIS and define the clusters by their patters into:

    1. Peak in the morning (cluster 0)
    2. Peak in both morning and afternoon, and weekends (cluster 1)
    3. Peak in the afternoon (cluster 2)
    4. Peak in both morning and afternoon (cluster 3)
Entrance Cluster 0 Entrance Cluster 1
Peak in the morning Peak in both morning and afternoon, and weekends
Entrance Cluster 2 Entrance Cluster 3
Peak in the afternoon Peak in both morning and afternoon
  1. For each station, merge its entrance and exit cluster in to final category. There are 16 possible combinations but only 9 generated in practice:

Below shows the distrubution of stations in group A, B and D:
It is significant to notice the stations regard as residential group are distributed around Taipei City, while work place and leisure tpyes are located at the center of the city.

Group A Group B Group D
Entrance: Peak at a.m.
Exit: Peak at p.m.
Entrance: Peak at p.m.
Exit: Peak at a.m.
Entrance: Peak at Both + Weekend
Exit: Peak at both + weekend
Type: Residential Type: Work Place Type: Residential, Work and Leisure

About

Use MRT hourly data to run Kmeans analysis based on entrance and exit people for each station

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published