Skip to content

MLSysOps/MIGProfiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIG Profiler

GitHub Docker Pulls

MIGProfiler is a toolkit for benchmark study on NVIDIA MIG techniques. It provides profiling on multiple deep learning training and inference tasks on MIG GPUs.

MIGProfiler is featured for:

  • 🎨 Support a lot of deep learning tasks and open-sourced models on a various of benchmark type
  • 📈 Present comprehensive benchmark results
  • 🐣 Easy to use with a configuration file (WIP)

The project is under rapid development! Please check our benchmark website and join us!

Benchmark Website 📈

Coming soon!

Install 📦️

Install by PyPI

pip install migperf

⚠️ For Deep Learning (DL) framework (PyTorch) and its task-specific DL libraries like Hugging Face Transformers and OpenCV, you may need self-installation, since these libraries have various dependencies for different users.

Use Docker 🐋

docker pull mlsysops/migperf:latest

And start to profile by

docker run --gpus=all --network host --rm -ti mlsysops/migperf:latest

⚠️ Due to Docker device mounting mechanism, we are not able to adjust MIG configuration via MIGController inside docker. Please setup the MIG devices in the host machine before you start to profile.

Manual build

Clone the repo:

git clone https://github.com/MLSysOps/MIGProfiler.git

It is recommended to create a virtual environment for testing:

conda create -n mig-perf python=3.8
conda activate mig-perf

Manually install the required packages (you should install the correct version):

conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c conda-forge opencv
pip install transformers

Finally, build migperf package:

pip install .

Quick Start 🚚

You can easily to profile on MIG GPU. Below are some common deep learning tasks to play with.

1. MIG training benchmark

We first create a 1g.10gb MIG device

from migperf.controller import MIGController
# enable MIG
mig_controller = MIGController()
mig_controller.enable_mig(gpu_id=0)
# Create GPU instance
gi_status = mig_controller.create_gpu_instance('1g.10gb', create_ci=True)
print(gi_status)

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python train/train_cv.py --bs=32 --model=resnet50 --mig-device-id=0 --max_train_steps=10 

Clean up after benchmarking

from migperf.controller import MIGController
# disable MIG
mig_controller = MIGController()
mig_controller.destroy_compute_instance(gpu_id=0)
mig_controller.destroy_gpu_instance(gpu_id=0)
mig_controller.disable_mig(gpu_id=0)

2. MIG inference benchmark

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python client/block_inference_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

See more benchmark experiments in ./exp.

3. Visualize

  • in notebook
  • in Prometheus (under improvement)

Cite Us 🌱

@article{zhang2022migperf,
  title={MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs},
  author={Zhang, Huaizheng and Li, Yuanming and Xiao, Wencong and Huang, Yizheng and Di, Xing and Yin, Jianxiong and See, Simon and Luo, Yong and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2301.00407},
  year={2023}
}

Contributors 👥

  • Yuanming Li
  • Huaizheng Zhang
  • Yizheng Huang
  • Xing Di

Ackowledgement

Special thanks to Aliyun and NVIDIA AI Tech Center to provide MIG GPU server for benchmarking.

License

This repository is open-sourced under MIT License.