Unit 4 - Sprint 13 - Natural Language Processing (NLP)

Assignment 1

The goal of the assignment is to find the attributes of the best & worst coffee shops in the dataset. The text is fairly raw: dates in the review, extra words in the star_rating column, etc. So, we want to clean the data up for a better analysis.

We will start analyzing the corpus of text using text visualizations of token frequency and cleaning the data using techniques such as lemmatization and stopword removal.

Based on the analysis, we will answer the question what makes the best, the best, and the worst, the worst? Graphs and numbers from the analysis should support the conclusions.

Authors

@jianninapinto

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
DS_411_Text_Data_Assignment.ipynb		DS_411_Text_Data_Assignment.ipynb
DS_411_Text_Data_Assignment_Alt_Sol.ipynb		DS_411_Text_Data_Assignment_Alt_Sol.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

DS_411_Text_Data_Assignment.ipynb

DS_411_Text_Data_Assignment.ipynb

DS_411_Text_Data_Assignment_Alt_Sol.ipynb

DS_411_Text_Data_Assignment_Alt_Sol.ipynb

README.md

README.md

Repository files navigation

Unit 4 - Sprint 13 - Natural Language Processing (NLP)

Assignment 1

Authors

About

Releases

Packages

Languages

jianninapinto/Coffee-Shops-Review-Analysis-using-NLP

Folders and files

Latest commit

History

Repository files navigation

Unit 4 - Sprint 13 - Natural Language Processing (NLP)

Assignment 1

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages