Yelp is a popular platform for user reviews for various brands, especially for restaurants and hotels. This application gathers the reviews for a specific brand in a specific City and by applying Natural Language Processing algorithm, provides Sentiment Analysis graphically.
The main data source for this Sentiment Analysis program are Yelp APIs. There are 2 APIs that we are accessing here:
- Yelp Businesses API
- Yelp Reviews API
- Prompt for the City where the Yelp reviews need to be pulled for. For example, Atlanta.
- Prompt for the Brand Name for which the reviews need to be analyzed. For example, Mcdonalds.
- Using the Yelp Business API, extract the location details of the brand in that city. For example, all Mcdonalds in Atlanta.
- Then using the Yelp Reviews API, extract all the reviews for each of the locations extracted in Step 3.
- Using Huggingface and other NLP libraries, perform Sentiment Analysis on the review text.
- Present a graph of the top ranking Sentiments.
- Save data in CSV, SQL Server or MongoDB
- Create Visualizations in Power BI (or Tableau)
- Prompt and get City and Brand to be analyzed
- Use Yelp Business API to extract all brand locations in the city
- Add these to a Panda Dataframe
- Use Yelp Review API to extract and display Star Rating and Review Text of the brand locations
- Add these to a Panda Dataframe
- SentimentIntensistyAnalyzer function from the NTLK libraries is used assign a score based on the verbiage of the Review text.
- Depending on if the Score is > 0, Equal to 0 or < 0, we determine the Score Category as Positive, Neutral or Negative.
- The Score and the Score Category are added as columns to the dataframe.
- Using the Cardiff NLP model analyze the Review text and assign a Positive, Negative and Neutral score to it.
- This is BERT (Bidirectional Encoder Representations from Transformers) and is trained on over 50M Tweets
- Using another BERT Model arpanghoshal/EmoRoBERTa to get the emotion/sentiment in the Review text.
- This model has been trained using Reddit comments
- It helps identify emotions such as - admiration, amusement, disapproval, disgust, relief etc. and even a Neutral emotion.
- This emotion is added as another column to the dataframe
- To ensure that we don't clutter the Sentiment Analysis with a bunch of Neutral Reviews, we filter those out.
- Plot a bar chart showing the Sentiments by Review Count
- The bar chart is sorted in the descending order of Number of Reviews
- The Sentiment with the highest number of Reviews is on the top.
- We can also generate a pie chart showing the distribution of the emotions across the reviews
- Save the Dataframe with the Sentiment Analysis to CSV file and a SQL Server Table for future analytics
- The program first checks if the CSV file exists if it does the data is Appended else the file is created.
- Next, using SQLAlchemy and ODBC connection, the program connects to the MS SQL Server database, SentimentAnalysis.
- The program then checks if there is a Table called Sentiment_Analysis in the DB.
- If table does not exist, the program creates the table else appends the rows to the existing table.
- Save the Dataframe with the Sentiment Analysis to a MongoDB Database Collection
- For that, the Pandas Dataframe is converted to a Dictonary.
- A MongoDB database connection is created.
- A Collection within that MongoDB database is created.
- The Dictionary is then written to the Collection.