trendstogo plots food trends at a City and Food Topic level. The data was obtained from menu and review data from 30,000 restaurants on grubhub.com. Please check it out on https://trendstogo.herokuapp.com/.
This project was incepted from a desire to find food trends in the US and an empathy towards restaurants, being difficult entities to succeed. As the demand for food delivery grow exponentially over the past years, so has the size of platforms such as Grubhub, Ubereats and Doordash. But while the market grows, restaurants are not necessarily better off as they swallow high costs of delivery and lose information on their customers and what customers value.
From public information (Grubhub's website), I was able to obtain data about restaurant, their menu profile and their review over time.
There are 3 main/ modules parts to this project:
There were four types of information collected, which helped cluster the restaurants:
- Cuisine Names
- Dish Names
- Price of Top items
- Variabilty of menu
LDA Topic Modeling was used to reduce the dimensions of Cuisine Names and Dish Names to fit into our final clustering model. K-means, with an input of 62 features all standardized (K-means is sensitive to scaling) was selected as our final clustering model.
The results enabled us to differentiate high level cuisine type restaurants. See below as an example:
Related blog post:So you Want to Open a Ghost Kitchen
The related notebook can be found here in these
Cities are inherently different. iHouston is different from New York, so don't expect Italian places to do equally well in Houston as it does in New York. So how exactly do we compare between cities? We can choose to compare a city with cities similar to itself (sister cities) via similiarities in Food Importance.
Better yet, we can create a clone.
Related Blog Post: Cities Aren't Alike, So Create a Clone
While we can find market gaps between cities, trends change over time. What if Thai food is a more recent 8-month trend in Houston? One way we can capture that is through time-series. Here, I took posted date from restaurant reviews to obtain a time-series rolled up to a weekly level (starting Sunday).
The final results, with its Topic and City clusters are visualized through a Streamlit app: https://trendstogo.herokuapp.com/.
Related Blog Post: Being Bullish about ... Bulls?
The data was obtained by making a direct API call to Grubhub. The first step was to get the longitude, latitude and radius mile of major city areas in the US and scrape all the restaurant ids. By iterating through each restaurant id, I am able to obtain restaurant, menu and review data in three separate API calls.
The data was mostly parsed and kept in a postgres database. Only review data is updated weekly to update the streamlit app.
model - Clustering model and finding market gaps
- cluster-model.ipynb - NLP and clustering model to create 30 restaurant profiles
- clone-city.ipynb - find market gaps by creating sister cities; a concrete example with Houston
- topic-naming.ipynb - curate important information we fed into our clustering model, to inform the Topic naming
visualization - python scripts accessing db to visualize on streamlit
- st_3.py - functions and modules creating visualization
- data.py - functions to access db tables on postgres
- Procfile - declare commands run by streamlit
numpy 1.16.1
scipy 1.2.2
pandas 0.24.2
matplotlib 3.1.1
psycopg2 2.6.1
gensim 3.8.1
streamlit 0.49.0
In the future, I'd like to incorporate the following improvements to the project:
- Forecasting at a City and Food Topic level
- Menu generation
- Price recommendation
Follow this repo updates 😊