Skip to content

SafeAILab/zkDL

Repository files navigation

zkDL: zero-knowledge proofs of deep learning on CUDA

Version License Maintenance Contributions welcome

zkDL is a specialized backend that provides zero-knowledge proofs (ZKP) for deep learning powered by CUDA, with a 1000x to 10000x speedup in the standard zkML Benchmark.

highlight

Latest News

  • [2023/10/1] v1.0.0 is released. New features:
    • The first-known zkML platform runnable on GPUs.
      • New CUDA backend for arithmetic operations in the finite field and elliptic curve, which might be of independent interest to the ZKP community more broadly.
    • Supporting ZKP for MLP with customized width and depth (other architectures will be coming soon). Allowing users to load their own NNs from PyTorch (.pt file) for ZKP.
    • Achieving a 1000x to 10000x speedup in the standard Benchmark.

Introduction

zkDL represents a significant step in integrating zero-knowledge proofs with deep learning. It uniquely emphasizes the preservation of tensor structures and harnesses the parallel processing power of CUDA, resulting in efficient proof computations.

Benchmarking

zkDL's performance was gauged against benchmarks established by ModulusLab. The benchmarks target verifiable inference in fully-connected neural networks, varying in size and scaling up to 18M parameters.

zkDL was evaluated in two distinct scenarios:

  1. Single Example Inference (zkDL-1)
  2. Batch Inference of 256 Examples (zkDL-256)

Thanks to its compatibility with tensor structures, zkDL doesn't exhibit linear scaling in total proof time as batch size increases. Instead, the time remains almost constant, such that zkDL-256 can consistently achieve a per-data-point proving time of under 0.1 seconds. This results in a remarkable speed-up of between 1000x to 10000x when compared with other baseline benchmarks.

Even in scenarios of verifiable inference for individual examples, zkDL still showcases a speed-up ranging from 10x to 100x. It's worth noting that in such scenarios, zkDL's batch processing capabilities aren't fully utilized.

Editor

Technical Overview

  • Foundation: This project is based on the CUDA implementation of the bls12-381 elliptic curve, using the ec-gpu package developed by Filecoin.

  • Quantization: For the efficient application of ZKP tools, the floating-point numbers involved in deep learning computations are quantized.

  • Tensor Structures and GKR Protocol: We utilize a specialized version of the GKR protocol to maintain tensor structures, facilitating the parallelization of proofs. For operations like ReLU, which are inherently non-arithmetic and thus challenging for ZKP schemes, auxiliary inputs are employed to transition them into arithmetic operations.

  • Neural Network Modelling: We fit the neural network into the ZKP backend by modelling it as an arithmetic circuit. Our strategy breaks free from the conventional layer-wise precedence, especially when non-arithmetic operations come into play, allowing for a more efficient 'flattened' circuit representation.

Prerequisites

Before running the demo, ensure you have the following requirements:

  1. CUDA: Ensure you identify and use a compatible CUDA architecture. In this guide, we use the sm_70 architecture as a reference. Please refer to the NVIDIA documentation to choose the appropriate architecture for your GPU.

  2. Python 3.10: We recommend managing your Python environments using virtualenv. Download and install Python 3.10 from the official Python website.

  3. PyTorch: This can be installed within your virtual environment. Instructions are provided in the subsequent sections.

Setup & Installation

  1. Configure the GPU Architecture:

    • Set the desired GPU architecture in the Makefile by updating the NVCC_FLAGS:
    # NVCC compiler flags
    NVCC_FLAGS := -arch=sm_70
  2. Set up a Virtual Environment:

    • Create and activate the environment using the following command:
    virtualenv --no-download --clear path/to/your/env
    source path/to/your/env/bin/activate
  3. Install Necessary Packages:

    • Install numpy and torch. Note that installing torch will also include LibTorch, the C++ interface for torch, in your virtual environment:
    pip install torch numpy
  4. Prepare Your Model and Data by PyTorch:

    • To generate example data and a model (traced_model.pt and sample_input.pt), execute:
    python model.py
    • Alternatively, you can use your own model and data. Ensure that they are serialized to be compatible with the C++ loader. For this demo, we only support multilayer perceptrons (MLPs) with ReLU activations. The expected input tensor shape is (batch_size, input_dim).
  5. Compile the Demonstration:

    • Run the following command:
    make demo
    • Please be patient as the compilation might take a while, possibly a few minutes. We're aware of this and are working to enhance the compilation speed.

Running the Demo

To start the demo, use the following command:

./demo traced_model.pt sample_input.pt

If you've generated traced_model.pt and sample_input.pt using our provided model.py:

  • This command will execute an inference on a fully-connected neural network with ReLU activations, characterized by:
    • Layers: 8
    • Parameters: ~18M
    • Input Dimension: 784
    • Output Dimension: 1000
    • Hidden Dimensions: Predominantly 1773, except for the first (1000) and the last (1124)
    • Batch Size: 256

This neural network matches the specifications of the largest benchmark.

  • Performance Expectations:
    • The entire process, including initialization, should complete within a few seconds.
    • Proving time on a modern GPU is anticipated to be between 0.075 to 0.15 seconds. Actual timings may vary based on server performance.

For convenience, we've encapsulated the build and demo run processes in demo.sh. Execute it using:

sbatch demo.sh # Tested on our slurm-managed cluster

or simply:

./demo.sh

Future Development

  • Broaden the range of supported structures and back propagations to increase adaptability.

  • Re-introduce zero-knowledge verifiable training alongside inference, as detailed in zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training.

  • Implement proof compression across deep learning layers and explore a multi-GPU version for enhanced performance.

License

The license for this zkML tool is BSD-2-Clause.

Contribution

As an open-source project in a dynamically evolving field, we wholeheartedly welcome contributions, whether they take the shape of novel features, enhanced infrastructure, or improved documentation.

For those interested in contributing to zkDL or seeking further information, please submit a pull request or reach out to Haochen Sun at haochen.sun@uwaterloo.ca. If you find this repository useful, please consider giving ⭐.