Skip to content

FunkSVD from scratch hands-on experience on Starbucks dataset

Notifications You must be signed in to change notification settings

Kusainov/funksvd-starbucks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Starbucks Capstone Project.

Motivation

The purpose of the project is to analyze Starbucks data that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app and to address the following:

  1. Do offers really play significant role in company's cash inflows?
  2. What kind of offers really excite people and bring more revenue?
  3. General idea about people that tend to ignore offers comparing to responsive people.
  4. How customer experience in terms of promo offers can be improved through personalization of the offers distribution using collaborative filtering technique (FunkSVD).

Install

This project requires Python 3.x and the following Python libraries installed:

  1. NumPy
  2. Pandas
  3. Matplotlib
  4. Json
  5. Plotly
  6. Math
  7. Tqdm
  8. Seaborn

You will also need to have software installed to run and execute an iPython Notebook

Code

Code is provided in Starbucks_Capstone_notebook.ipynb file.

Data

The data is contained in three files:

  • portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
  • profile.json - demographic data for each customer
  • transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

portfolio.json

  • id (string) - offer id
  • offer_type (string) - type of offer ie BOGO, discount, informational
  • difficulty (int) - minimum required spend to complete an offer
  • reward (int) - reward given for completing an offer
  • duration (int) - time for offer to be open, in days
  • channels (list of strings)

profile.json

  • age (int) - age of the customer
  • became_member_on (int) - date when customer created an app account
  • gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
  • id (str) - customer id
  • income (float) - customer's income

transcript.json

  • event (str) - record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) - customer id
  • time (int) - time in hours since start of test. The data begins at time t=0
  • value - (dict of strings) - either an offer id or transaction amount depending on the record

Note As long as this dataset comes from Starbucks through Udacity it cannot be publicly shared. Had to delete origin files from Github.

Run

In a terminal navigate to the top-level project directory funksvd-starbucks/ (that contains this README) and and run one of the following commands:

ipython notebook Starbucks_Capstone_notebook.ipynb or

jupyter notebook Starbucks_Capstone_notebook.ipynb This will open the iPython Notebook software and project file in your browser.

Summary

Throughout the notebook I have analyzed Starbucks given data, applied data cleaning, transformation and visualization.

65.53% of purchase inflows generated by transactions not related to any offer. Discount and bogo are giving relatively similar to each other figures: 14.06% and 13.33% respectively. Informational is at most 7.08%.

These 35% of cash inflows influenced by offers are generated by almost 72% of customer that received offers at least once. Within given consumer population about 28% were not affected at all although recieved at least one offer during the experiment.

It was found that discount offers with difficulty 7, 10 and duration 7 and 10 respectively are the offers that really excite people. From informational offers, an offer with 3 days durations is also got high influence rate to the consumers. These 3 offers are leading in terms of overall sum of amount spent by the customers.

As for the general observations concerning 'not responsive' people, we can say that men ignoring offers more frequently than women, clients not using offers generally older and the income is lower comparing to responsive. Majority of 'not responsive' customers registered in 2018. Positively responsive people tend to purchase for bigger amounts which probably explained by the their willing to complete offers. Moreover I have identified special group of people that is heavily presented among consumers ignoring offers.

Based on the transformed transactional information I have formed user-item-matrix that reflects positive or negative (ignore) reaction of customers to the received offers.

Basic form of FunkSVD without regularization was selected to fulfill missing values (rates) in user-item-matrix as not all customers recieved all of possible offers. In order to assess how well the model is doing I have splitted the data into tran and test sets. As it was expected the model is doing better than a naive prediction (sending offers to all customers as if all customers are happy to receive and use offers) on the test set. What we should keep in mind is that for 45 customers we could not make predictions due to cold start problem as they were not presented in both sets simultaneously.

I could not achieve accuracy more than 0.7093 as the more the model is trained on the train set the more it is overfitting. So there was a trade off between training model on the train set and the prediction power on test set. Although 0.7093 does not look very bad, we should think over possible further steps.

What else can be done? Possible further analysis and improvement

  1. Performance can be compared with supervised learning algorithms that will receive as input customer data and will predict whether consumer positively responds to an offer or not.
  2. Special group of people can be eliminated from the dataset and the model performance can be compared to the previously achieved. Probably the special group is adding variance to the data and accordingly the model cannot generalize better.
  3. As an alternative to the offline approach we used here, we could do an online approach where we run an experiment to determine the impacts of implementing one or more recommendation systems into our user base (one can be based on FunkSVD and the second one based on supervised learning algorithm for example). A simple experiment for this situation might be to randomly assign users to a control group that receives additonal offers they never seen. Then we capture reaction to them and compare it with the predictions of the selected algorithms and measure performance.

Blog post

Link to blog post: https://medium.com/@t.kussainov/funk-svd-hands-on-experience-on-starbucks-data-set-f3e0946da014

License

This project was completed as part of the Udacity Data Scientist Nanodegree. Data were provided by Udacity. The data was originally sourced by Udacity from Starbucks.

About

FunkSVD from scratch hands-on experience on Starbucks dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published