Mid-Term Project

This repository cointains all the information you need to work on the Mid-Term project.

Hello and Welcome!!!

The idea of this project is to segment the neighborhoods of New York City into separate clusters and examine the information about them. A desirable intention is to create the neighborhood cluster’s based on the various information we are able to find, for example:

location
restaurants and different venue types
Uber rides
meetups
There could be something interesting in NYC Open Data
You can use different data that come to your mind as well

Do not include any economic or demographic indicators in our input data. However, further examination might reveal if data above has any relationship with the diversity of a neighborhood and its economics.

Files

data_preparation.ipynb - contains further information about how to proceed with data preparation
modeling.ipynb - contains important information about the modeling part of the project

Data

The NYC neighborhoods can be found here.
Average housing prices for Manhattan and Brooklyn from July 2020 by City Realty.
Median housing prices for Manhattan and Brooklyn from Feb 2021 byZumber.

Presentation Guidelines

The main goal of this presentation is to prepare you for your Demo Day at the end of the bootcamp where your time will be capped. Therefore, it's important to keep the duration of the presentation to max 5 minutes (number of slides doesn't necessarily determine the duration of the presentation). Focus on explaining what you did, how you approached the problem, what you achieved, and, if appropriate, suggest what else could be done. Don't speak to every single task and step there is, focus more on the highlights and interesting findings instead. If you struggled with something, feel free to mention it, but do not undermine your work by highlighting that part.

Spend 1 min on project flow structure. Which steps does your project have?
Spend 1 min on showing different APIs and data sources you took the information and data from.
- were there any interesting findings you came up with during EDA?
Results (1 min):
- what clustering techniques did you use?
- evaluation metrics
- how did you come up with the number of clusters?
1 min on profiling of clusters
- what are the features that show the biggest difference across the clusters?
- what features are showing the biggest correlation with economic indicators?
Explain the biggest challenges in 1 min.
- what would you do if you have a bit more time?

Submission Guidelines

Share the link to your project repository through Compass
The file submission.csv that contains two columns, name of the neighborhood and cluster_id, should be included in the repository.

How to Start

As the first step, we need to parse the original JSON file with the neighborhoods into the Pandas. Afterward, we can start to gather different data from various sources. Spend some time on research of different APIs that could be used. Information from all data sources need to be joined with the neighborhoods before we can proceed with feature engineering and clustering itself. Do not forget about visualizations of clusters and neighborhoods as well becaue there is a lot of things that can be done in this area.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
data_preparation.ipynb		data_preparation.ipynb
modeling.ipynb		modeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

data_preparation.ipynb

data_preparation.ipynb

modeling.ipynb

modeling.ipynb

Repository files navigation

Mid-Term Project

Hello and Welcome!!!

Files

Data

Presentation Guidelines

Submission Guidelines

How to Start

About

Releases

Packages

Languages

alynnr/mid-term-project-II

Folders and files

Latest commit

History

Repository files navigation

Mid-Term Project

Hello and Welcome!!!

Files

Data

Presentation Guidelines

Submission Guidelines

How to Start

About

Resources

Stars

Watchers

Forks

Languages