Skip to content

aslotte/mldotnet-real-time-data-streaming-workshop

Repository files navigation

Build Status

Introduction

Working with real-time data streams, and deriving real-time insights using custom machine learning models have become increasingly important for many organizations. There are numerous real-time data platforms currently available (e.g. Kafka, Hadoop, Spark), but the one we will be focusing on in this workshop in particular is Azure Stream Analytics. In addition to diving in to Azure Stream Analytics, we will also explore the open-source cross-plattform library ML.NET, which we will use to build our custom machine learning models and look at an alternative solution using Azure Machine Learning Service.

Getting Started

Expand for instructions to set up prerequisites

  1. Download the .NET Core SDK
    1. Go to the following page to download the SDK
    2. Select the correct tab for your operating system (e.g. Windows, Linux or Mac)
    3. Click on the Build Apps download option
    4. Open the installer once the download is complete and follow provided instructions

  2. Install VS Code
    1. Go to the following page to download the VS Code
    2. Select the correct installation for your operating system (e.g. Windows, Linux or Mac)
    3. Open the installer once the download is complete and follow provided instructions
    4. Open VS Code once the installation is complete

  3. Install the C# Extension
    1. In VS Code, select View -> Extensions
    2. Search for C#
    3. Click Install

  4. Install the Azure Functions Extension
    1. In VS Code, select View -> Extensions
    2. Search for Azure Function
    3. Click Install

  5. Install the ML.NET CLI
    1. In VS Code, select Terminal -> New Terminal to open a new terminal window
    2. In the terminal, enter dotnet tool install -g mlnet and hit enter

  6. Clone the repository
    1. In VS Code, select Terminal -> New Terminal to open a new terminal window
    2. In the terminal, enter cd C:\ and hit enter
    3. In the terminal, enter git clone https://github.com/aslotte/mldotnet-real-time-data-streaming-workshop.git and hit enter to clone the repository to the C: drive.
      Note: Feel free to clone the repository elsewhere, just make sure to adjust the path in instructions to follow. Furthermore, the repository is also available on provided USB memory sticks, in case the internet bandwidth is not sufficient.

  7. Download the data
    1. There are three (3) ways to get the data we will be working with. Please choose the most convenient for you:
      1. Download the data from provided USB Memory sticks (download the .zip file and extract it on your local computer)
      2. Download the data from here
      3. Download the data from Kaggle (requires free account)

  8. Create a free Azure subscription
    1. Go to Azure to create a free trial account
    2. Enter your contact information and click Next
    3. Fill in your credit card information and click Next.
      Note that this is only used to verify your identify, you'll not be charged.
    4. Check the checkbox to agree to terms and conditions and click Sign-up

  9. Create an Outlook e-mail
    1. Go to Outlook to create a free Outlook account
    2. Follow the provided instructions

  10. Download Azure Storage Explorer (required for part 3)
    1. Download Azure Storage Explorer. Make sure to select the correct OS.
    2. Open the installer
    3. Follow the provided instructions

    Note to macOS users: If the web site downloads an .exe file even after selecting the macOS option please, download the macOS version from here.

Problem Outline

As a financial institution, detecting fraud is imperative to ensure safe and continuous operations for the bank and its customers.

In this workshop we will be looking at detecting fradulent transactions in real-time. We will be training our model based on publicly available data from Kaggle and integrating this custom machine learning model in a real-time data pipeline, supported by Azure Stream Analytics.

Outline of Learning Objectives

  • Part 1: Machine Learning in .NET
    • Introduction to Machine Learning and ML.NET
    • Explore the data with Jupyter Notebooks and Pandas
    • Train a machine learning model using ML.NET
    • Train a machine learning model using AutoML CLI
  • Part 2: Setting up a real-time data streaming pipeline
    • Introduction to Stream Processing and Azure Stream Analytics
    • Introduction to Azure Resource Management (ARM) Templates
  • Part 3: ML.NET + Azure DevOps = MLOps
    • Introduction to MLOps
    • Set up a CI/CD pipeline for model training
  • Part 4: ML.NET + Jupyter
    • Introduction to ML.NET in Jupyter Notebooks
    • Train a machine learning model using ML.NET and Jupyter Notebooks
  • Part 5: Machine Learning in Azure
    • Introduction to Azure Machine Learning Service
    • Train a machine learning model using Azure ML Visual Interface
    • Train a machine learning using Azure AutoML
    • Train a machine learning using Jupyter Notebooks and Scikit Learn
  • Part 6: Consume ONNX Model from Jupyter Notebook in ML.NET
    • Consume an exported ONNX model in ML.NET, which was trained with Scikit Learn (Python)

Reminder: Remember to remove your resource group once finished with this workshop, not to incur additional costs.

Solution Architecture

A Real-Time Data Pipeline with ML.NET

Real-Time Data Pipeline with ML.NET

A Real-Time Data Pipeline with Azure Machine Learning Studio

Real-Time Data Pipeline with Azure ML

Assumptions

This workshop is currently valid for ML.NET v1.4.0

Additional Resources