Skip to content

alynnr/Unsupervised-Learning-Midterm-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mid-term Project

Overview

This is a summary of our group's workflow throughout the week and the content within each directory for the Mid-term project, NYC neighborhood clustering.

Please see the content of submission.csv for the final result.

Content

Alex

  • Dataset that contains informtion for features and infrastructure counts for each NYC neighborhood.
  • Jupyter notebooks that outline the data exploration and analysis process
  • various graphs within Jupyter notebooks for presentation purposes

Harpreet

  • Datasets that contain information for restaurant's distribution and clustering
  • Geospatial graphs for presentation purposes
  • Jupyter notebooks that entails the data exploration and analysis processes

Jesse

  • Various datasets that contain information for NYC real estate, taxation and zoning
  • Functionzed py files that can be called for various plotting and modelling purposes
  • Graphs for presentation purposes.

Presentation

  • Scripts for presentation
  • PDF of the presentation slides

Outline of the presentation

  • Stating the project scope and business case
  • Alex - outlining modelling techniques, datasets and data science techniques used for inquiry
  • Alex - summary of infrastructure and amenities in NYC, and its resultant clustering
  • Harpreet - distribution of restaurants and their corresponding types in the Five Boroughs, and its resultant clustering
  • Jesse - Pluto dataset and the consequent clustering for skyscrpaers within NYC, and its correlation with Alex's findings
  • Jesse - specific examples for business clients and a summary of each examples
  • Harpreet - presentation conclusion

Action plan for project, NYC neighborhood clustering

Weekend - Data gathering, filtering and preprocessing

Some of the APIs and websites we can use for data gathering:

  • Foursquare
  • Yelp
  • Google Places API
  • Google Big Query
  • NYC Public data set
  • NYC datasets on Kaggle
  • Various data dashboard on NYC Real Estate

Monday - Modelling and script writing

Modelling Techniques:

  • Unsupervised learning: KMeans, Agglomerative modelling, DBSCAN
  • Supervised learning: Random Forest, Naive Bayes, XGBoost

Tuesday - formatting Github and project directories

  • Pulling and pushing from Github
  • Directory heirarchy, testing and function .py srcipts
  • Queried data storage in CSV format: different versions of raw vs processed data

Wednesday - Data Visualization

Thursday - Making the presentation

Constructing the narrative within the presentation Possible candidates:

  • Foreign client/investor looking to purchase real estate within NYC
  • Consultation service for local small retail business

Friday - Presentation day

About

Unsupervised Learning Midterm Project for Lighthouse Labs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •