Cloud ML Ecosystem (CMLE)

This repository documents the various steps and pipelines needed to run Machine Learning (ML) models leveraging Adobe Experience Platform data. There are 2 parts to this repository:

Notebooks to illustrate the end-to-end workflow on a variety of cloud ML platforms.
Pipeline and library code that can be extended from to add your own ML model and data.

Notebook Link	Environment	Summary	Scope
Week1Notebook.ipynb	Local	Data Exploration	Create synthetic data and perform exploratory data analysis.
Week2Notebook.ipynb	Local	Featurization	Generate a training set and export to cloud storage.
Week3Notebook.ipynb	Cloud Platform	Training	Plugging in features into a machine learning model.
Week4Notebook.ipynb	Cloud Platform	Scoring	Generating propensity scores for all profiles and importing back to the Experience Platform.
Week5Notebook.ipynb	Cloud Platform	Targeting	Target profiles based on propensity interval.

Before you can use any of the code in this repository, there is some setup required. Please refer to the Environment part of the table above to determine the setup needed to run that notebook as described in the next section.

Note: If the Environment says Cloud Platform you can use your Cloud Platform of choice to run this notebook and refer to the appropriate section of the setup below.

Configuration File

There is a common configuration file used by all the notebooks. For the setup you can refer to the following section, but here are the different configuration options you should use:

Config Property	Section	Description	Value	Needed for
ims_org_id	Platform	The organization ID	See section below on Org-level Information	Weeks 1 through 5
sandbox_name	Platform	The ID of the sandbox to use	See section below on Org-level Information	Weeks 1 through 5
dataset_id	Platform	The ID of the dataset containing the synthetic data	Dataset ID created as part of Week1Notebook.ipynb	Week 2
featurized_dataset_id	Platform	The ID of the dataset containing the featurized data	Dataset ID created as part of Week2Notebook.ipynb	Weeks 3 & 4
scoring_dataset_id	Platform	The ID of the dataset containing the scoring output	Dataset ID created as part of Week4Notebook.ipynb	Week 5
environment	Platform	The type of environment this organization is running under	prod if running in production, stage otherwise	Weeks 1 through 5
client_id	Authentication	The client ID used for API calls	See section below on Authentication Information	Weeks 1 through 5
client_secret	Authentication	The client secret used for API calls	See section below on Authentication Information	Weeks 1 through 5
scopes	Authentication	The scopes used for API calls	See section below on Authentication Information	Weeks 1 through 5
export_path	Cloud	The path in your Cloud Storage account where featurized data will be exported	Default to `cmle/egress`	Weeks 2, 3 & 4
import_path	Cloud	The path in your Cloud Storage account where scoring results will be written	Default to `cmle/ingress`	Weeks 4 & 5
data_format	Cloud	The format of the files for the featurized data	Default to `parquet`	Weeks 2, 3 & 4
compression_type	Cloud	The type of compression for the featurized data	Default to `gzip`	Weeks 2, 3 & 4
model_name	Cloud	The name of the model	Default to `cmle_propensity_model`	Weeks 3 & 4

We assume familiarity with the connectivity to Adobe Experience Platform APIs, but for brevity the key setup points are listed below.

Org-level Information

To connect programmatically to your Adobe Experience Platform instance, we need to know a couple pieces of information:

The IMS organization ID: this represents your entire organization and is the same across all your sandboxes.
The sandbox name: You may have multiple sandboxes for different environments or business units. The default one is prod

You may already have this information, in which case you can skip to the next section. If you do not, you can log into your Adobe Experience Platform instance and press CTRL + i (even on Mac) to bring up the data debugger as shown below which contains your organization ID.

For the sandbox name, you can get it one of 2 ways:

If you do not have admin access to your Adobe Experience Platform instance, you can find it in the url within the section starts with sname.In the example below, the sandbox name is cmle. Be aware though it should be all lowercase and not contain any spaces so if you are unsure you can ask your instance admin to verify.
If you have admin access, you can navigate to the Sandboxes panel to the left, click on Browse and note the name value of the instance you want to use.

Authentication Information

Authentication with the Adobe Experience Platform can be performed using OAuth and will allow you to interact with most Adobe Experience Platform APIs and handle automated processes such as:

Creating schemas and/or datasets
Ingesting data
Querying data
...

The process to setup a OAuth connection is described below, and you will need to capture a few fields that will be used throughout the notebooks and configurations in this repository.

The first step is to go to the Adobe Developer Console which comes with any Adobe Experience Platform organization. Make sure you are logged into the organization you would like to use. You should be presented with a screen like below:

Once on that page, there are 2 options:

You already have an existing developer project setup. If so, just click on that project that should appear here.
You haven't used the developer console yet or do not yet have a developer project setup. If so, just click on Create new project which will create it immediately and take you to its home page.

Once on the project page, several options are presented to you as shown below. Click on Add API.

In this new API page, you will be shown the different types of APIs that are available. In our case, we want to create an Experience Platform API. Select that option and press Next.

This next page allows you to configure access to your API. We will use the OAuth Server-to-Server:

Then click Next. This next page will ask you to select a product profile. It will depend what profiles are configured on your org - they are used to scope access to only what is necessary. You can work with your organization administrator to find out the right profile, and then select it as shown in the screen below.

Make sure your api credential is assigned with Role(s) that contains all necessary permissions from Adobe Experience Platform UI under the tab Administration -> Permissions -> Roles -> API credentials. Please provide your credential info with your org administrator if you can't see the tab yourself.

Now your setup is complete, and you will be taken to the summary page for your API connection. You can verify that everything looks correct and scroll down to see a few fields:

The main fields you'll want to note down for further reference in this repository are:

Client ID: This is used to identify yourself when you use programmatic API access.
Client secret: Click on Retrieve client secret to see the value and do not share it with anyone

In addition to those, you'll want to save your private key. Make sure to save it in a secure location on your disk as it will need to be pointed to in your configuration file.

Setup

1. Local setup

Make sure to set the environment variable ADOBE_HOME to the root of this repository. This can be accomplished with the following command for example:

$ export ADOBE_HOME=`pwd`

For simplicity you can set it in your .bashrc or .zshrc (or the corresponding profile file if you use a different shell) to ensure it gets loaded automatically.

After setting up the configuration file described above, you can start your Jupyter notebook server at the root of this repository. Please select python kernel 3.0 or above It is important to start it at the root because the notebooks look for images on parent folders, so images will not render properly if you start the server too deep.

2. Databricks setup

Here are the pre-requisites to run these notebooks on Databricks:

Databricks Runtime Version should be of type ML and not Standard.
Databricks Runtime Version should be above 12.1 ML(the notebooks were tested on 12.1ML)
Your compute environment can be any number of nodes on any instance type.
You will need to create a personal user token
You will need to define $DBUSER to be your user on the databricks cluster
To begin please setup a personal access token on your databricks cluster by following these instructions Access_Token Setup The next few steps assume you have already installed and setup the Databricks CLI to point to your Databricks workspace.
Copy the updated configuration file to your Databricks workspace filesystem using:

$ databricks fs cp conf/config.ini dbfs:/FileStore/shared_uploads/$DBUSER/cmle/conf/config.ini

Import the notebooks in your workspace. Please refer to the Contents section of this README to determine which notebooks should run on Databricks. For example to import the week 3 notebook you can do:

$ databricks workspace mkdirs /Users/$DBUSER/cmle
$ databricks workspace import notebooks/assignments/Week3Notebook.ipynb /Users/$DBUSER/cmle/Week3Notebook -l python -f jupyter

Create a compute environment following the pre-requisites mentioned above, and make sure to include in the Environment variables section the following:

ADOBE_HOME=/dbfs/FileStore/shared_uploads/$DBUSER/cmle

After the compute environment is started successfully, attach the imported notebook to it. You are now ready to execute it.

3. Azure ML setup

Not yet supported.

4. SageMaker setup

5 custom notebooks have been built that show Sagemaker support. Once you have completed the steps above including Configuration File, Org-level Information and Authentication Information. Please find the customized notebooks in notebooks/aws_sagemaker and follow the SageMaker specific README

5. DataRobot setup

The DataRobot implementation allows you to leverage the DataRobot Development and MLOps capabilities to train models, deplopy them and write predictions into the AEP environment. A demo of the workflow can be found here: https://www.dropbox.com/scl/fi/apdzy8eoizrhm64n006hz/DR-DEMO.mp4?rlkey=bqf5vtx9n3m1cv6hebnwxe1uk&dl=0 To config add the DataRobot token and API-endpoint fields under the DataRobot section. If you have not created one, please refer to the following guide: https://docs.datarobot.com/en/docs/api/api-quickstart/index.html#create-a-datarobot-api-key For any questions, please reach out to support@datarobot.com

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
conf		conf
img		img
jars		jars
notebooks		notebooks
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COPYRIGHT		COPYRIGHT
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

Adobe-Marketing-Cloud/aep-cloud-ml-ecosystem

Folders and files

Latest commit

History

Repository files navigation

Cloud ML Ecosystem (CMLE)

Contents

Configuration File

Org-level Information

Authentication Information

Setup

1. Local setup

2. Databricks setup

3. Azure ML setup

4. SageMaker setup

5. DataRobot setup

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages