This is a summary of our group's workflow throughout the week and the content within each directory for the Mid-term project, NYC neighborhood clustering.
Please see the content of submission.csv for the final result.
Alex
- Dataset that contains informtion for features and infrastructure counts for each NYC neighborhood.
- Jupyter notebooks that outline the data exploration and analysis process
- various graphs within Jupyter notebooks for presentation purposes
Harpreet
- Datasets that contain information for restaurant's distribution and clustering
- Geospatial graphs for presentation purposes
- Jupyter notebooks that entails the data exploration and analysis processes
Jesse
- Various datasets that contain information for NYC real estate, taxation and zoning
- Functionzed py files that can be called for various plotting and modelling purposes
- Graphs for presentation purposes.
Presentation
- Scripts for presentation
- PDF of the presentation slides
- Stating the project scope and business case
- Alex - outlining modelling techniques, datasets and data science techniques used for inquiry
- Alex - summary of infrastructure and amenities in NYC, and its resultant clustering
- Harpreet - distribution of restaurants and their corresponding types in the Five Boroughs, and its resultant clustering
- Jesse - Pluto dataset and the consequent clustering for skyscrpaers within NYC, and its correlation with Alex's findings
- Jesse - specific examples for business clients and a summary of each examples
- Harpreet - presentation conclusion
Some of the APIs and websites we can use for data gathering:
- Foursquare
- Yelp
- Google Places API
- Google Big Query
- NYC Public data set
- NYC datasets on Kaggle
- Various data dashboard on NYC Real Estate
Modelling Techniques:
- Unsupervised learning: KMeans, Agglomerative modelling, DBSCAN
- Supervised learning: Random Forest, Naive Bayes, XGBoost
- Pulling and pushing from Github
- Directory heirarchy, testing and function .py srcipts
- Queried data storage in CSV format: different versions of raw vs processed data
- Visualization within Python with Matplotlib
- Visualization with Tableau
- Visualization with Polymer https://flixgem.com/?sort_by=IMDb%20Score%3Adesc
Constructing the narrative within the presentation Possible candidates:
- Foreign client/investor looking to purchase real estate within NYC
- Consultation service for local small retail business