Starbucks Capstone Project.

Motivation

The purpose of the project is to analyze Starbucks data that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app and to address the following:

Do offers really play significant role in company's cash inflows?
What kind of offers really excite people and bring more revenue?
General idea about people that tend to ignore offers comparing to responsive people.
How customer experience in terms of promo offers can be improved through personalization of the offers distribution using collaborative filtering technique (FunkSVD).

Install

This project requires Python 3.x and the following Python libraries installed:

NumPy
Pandas
Matplotlib
Json
Plotly
Math
Tqdm
Seaborn

You will also need to have software installed to run and execute an iPython Notebook

Code

Code is provided in Starbucks_Capstone_notebook.ipynb file.

Data

The data is contained in three files:

portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
profile.json - demographic data for each customer
transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

portfolio.json

id (string) - offer id
offer_type (string) - type of offer ie BOGO, discount, informational
difficulty (int) - minimum required spend to complete an offer
reward (int) - reward given for completing an offer
duration (int) - time for offer to be open, in days
channels (list of strings)

profile.json

age (int) - age of the customer
became_member_on (int) - date when customer created an app account
gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
id (str) - customer id
income (float) - customer's income

transcript.json

event (str) - record description (ie transaction, offer received, offer viewed, etc.)
person (str) - customer id
time (int) - time in hours since start of test. The data begins at time t=0
value - (dict of strings) - either an offer id or transaction amount depending on the record

Note As long as this dataset comes from Starbucks through Udacity it cannot be publicly shared. Had to delete origin files from Github.

Run

In a terminal navigate to the top-level project directory funksvd-starbucks/ (that contains this README) and and run one of the following commands:

ipython notebook Starbucks_Capstone_notebook.ipynb or

jupyter notebook Starbucks_Capstone_notebook.ipynb This will open the iPython Notebook software and project file in your browser.

Summary

Throughout the notebook I have analyzed Starbucks given data, applied data cleaning, transformation and visualization.

65.53% of purchase inflows generated by transactions not related to any offer. Discount and bogo are giving relatively similar to each other figures: 14.06% and 13.33% respectively. Informational is at most 7.08%.

These 35% of cash inflows influenced by offers are generated by almost 72% of customer that received offers at least once. Within given consumer population about 28% were not affected at all although recieved at least one offer during the experiment.

It was found that discount offers with difficulty 7, 10 and duration 7 and 10 respectively are the offers that really excite people. From informational offers, an offer with 3 days durations is also got high influence rate to the consumers. These 3 offers are leading in terms of overall sum of amount spent by the customers.

As for the general observations concerning 'not responsive' people, we can say that men ignoring offers more frequently than women, clients not using offers generally older and the income is lower comparing to responsive. Majority of 'not responsive' customers registered in 2018. Positively responsive people tend to purchase for bigger amounts which probably explained by the their willing to complete offers. Moreover I have identified special group of people that is heavily presented among consumers ignoring offers.

Based on the transformed transactional information I have formed user-item-matrix that reflects positive or negative (ignore) reaction of customers to the received offers.

Basic form of FunkSVD without regularization was selected to fulfill missing values (rates) in user-item-matrix as not all customers recieved all of possible offers. In order to assess how well the model is doing I have splitted the data into tran and test sets. As it was expected the model is doing better than a naive prediction (sending offers to all customers as if all customers are happy to receive and use offers) on the test set. What we should keep in mind is that for 45 customers we could not make predictions due to cold start problem as they were not presented in both sets simultaneously.

I could not achieve accuracy more than 0.7093 as the more the model is trained on the train set the more it is overfitting. So there was a trade off between training model on the train set and the prediction power on test set. Although 0.7093 does not look very bad, we should think over possible further steps.

What else can be done? Possible further analysis and improvement

Performance can be compared with supervised learning algorithms that will receive as input customer data and will predict whether consumer positively responds to an offer or not.
Special group of people can be eliminated from the dataset and the model performance can be compared to the previously achieved. Probably the special group is adding variance to the data and accordingly the model cannot generalize better.
As an alternative to the offline approach we used here, we could do an online approach where we run an experiment to determine the impacts of implementing one or more recommendation systems into our user base (one can be based on FunkSVD and the second one based on supervised learning algorithm for example). A simple experiment for this situation might be to randomly assign users to a control group that receives additonal offers they never seen. Then we capture reaction to them and compare it with the predictions of the selected algorithms and measure performance.

Blog post

Link to blog post: https://medium.com/@t.kussainov/funk-svd-hands-on-experience-on-starbucks-data-set-f3e0946da014

License

This project was completed as part of the Udacity Data Scientist Nanodegree. Data were provided by Udacity. The data was originally sourced by Udacity from Starbucks.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
Starbucks_Capstone_notebook.ipynb		Starbucks_Capstone_notebook.ipynb
coffee.jpg		coffee.jpg
pic1.png		pic1.png
pic2.png		pic2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Starbucks_Capstone_notebook.ipynb

Starbucks_Capstone_notebook.ipynb

coffee.jpg

coffee.jpg

pic1.png

pic1.png

pic2.png

pic2.png

Repository files navigation

Starbucks Capstone Project.

Motivation

Install

Code

Data

Run

Summary

Blog post

License

About

Releases

Packages

Languages

Kusainov/funksvd-starbucks

Folders and files

Latest commit

History

Repository files navigation

Starbucks Capstone Project.

Motivation

Install

Code

Data

Run

Summary

Blog post

License

About

Topics

Resources

Stars

Watchers

Forks

Languages