Skip to content

rudderlabs/rudderstack-profiles-classifier

Repository files navigation

ABOUT

This is the Binary classification repo built on top of any feature table generated by RudderStack profiles. It can build predictive features such as:

  1. Churn prediction: Whether a user will churn or not in the next 30 days (or any other time period)
  2. Conversion prediction: Whether a user will convert or not in the next 30 days (or any other time period)
  3. Any other problem which can be framed in a yes/no fashion (ex: whether a customer is going to make a purchase in the next n days)

Setup

The expected way to run this repo is through a RudderStack profiles project, linking this github repo url in a python model. One such project can be found here.

Once this repo is linked in a python_model inside a profiles project, you can run that project just like any other project, by firing the command pb run. But before that, you need to perform two steps (you can skip them if you want to run the models directly through RudderStack webapp and not locally):

1. Building a virtual environment

You can create a virtual environment either through Conda or through the venv module that comes by default with Python. Both the approaches are outlined below.

1.1 Building the conda environment

conda create -n pysnowpark --override-channels -c https://repo.anaconda.com/pkgs/snowflake python=3.8

NOTE - There is a known issue with running Snowpark Python on Apple silicon chips due to memory handling in pyOpenSSL. The error message displayed is, “Cannot allocate write+execute memory for ffi.callback()”.

As a workaround, set up a virtual environment that uses x86 Python using these commands:

CONDA_SUBDIR=osx-64 conda create -n pysnowpark python=3.8 --override-channels -c https://repo.anaconda.com/pkgs/snowflake
conda activate pysnowpark
conda config --env --set subdir osx-64

After creating the environment, you need to install the requirements inside the environment using "pip install -r requirements.txt".

NOTE- If you are running the code on Mac M1/M2, you need to install xgboost seperately using below lines -

brew install libomp
conda install -c conda-forge py-xgboost==1.5.0

1.2 Virtual Environment using venv

For MAC OS Install python 3.8 runtime

brew install python@3.8

Run the following command to create the environment

python3.8 -m venv pysnowpark 

Activate the environment and install the dependencies

source pysnowpark/bin/activate
pip install -r requirements.txt

NOTE- You might need to install another dependency: libomp seperately using below lines -

brew install libomp

2. Enabling python models in RudderStack profiles

Python models are disabled by default in profiles. You can enable them by adding following lines in the site_config file:

py_models:
    enabled: true
    python_path: <path_to_env> # You can get this by running   `which python` in the terminal after you activate your virtual env

3. Building the model

Refer to our docs page for how to set up a python model in your project. There are various advanced config options that you can find in the model_configs.yaml file. You can add these options in the python model of your profiles project to override the defaults.