Skip to content

Mamba Server Setup

William Guimont-Martin edited this page May 13, 2024 · 12 revisions

This guide's purpose is to give a quick overview of how to install everything required for the Mamba Server.

Installation

  1. Install Ubuntu Server 2024
    • Do not enable LVM group
    • Enable non-free drivers to install NVIDIA drivers
    • Do not install docker via the installer. It will install it with snap, which is incompatible with using GPUs inside docker containers. Instead, install it using apt.
  2. Enable persistent mode for the NVIDIA driver:
    • sudo vim /etc/systemd/system/enable-persistent-nvidia.service
    • [Unit]
      Description=Enable NVIDIA Persistence Mode
      After=multi-user.target
      
      [Service]
      Type=simple
      ExecStart=/usr/bin/nvidia-smi -pm 1
      ExecStartPre=/bin/sleep 5
      
      [Install]
      WantedBy=multi-user.target
    • sudo systemctl enable enable-persistent-nvidia.service
    • sudo systemctl start enable-persistent-nvidia.service
  3. Reduce timeout for systemd-networkd-wait-online:
    # FIXME this does not seem to work
    sudo systemctl edit --full systemd-networkd-wait-online.service
    # Change the line to
    ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --timeout=10 --any
  4. Install NVIDIA driver following the NVIDIA Drivers Installation Guide. Make sure to install the server version and to use apt.
  5. sudo apt install nvidia-utils-535-server
  6. Install CUDA following NVIDIA CUDA Installation Guide. (sudo apt install nvidia-cuda-toolkit cuda-drivers-fabricmanager-535)
  7. Install nvidia-container-toolkit: Installation guide. You will need to run sudo systemctl restart docker.service.

Install Slurm and Munge

Add a new user

sudo useradd -c 'Full name' -m <username> -G docker -s /bin/bash
sudo passwd <username>
# sudo chsh -s /bin/bash <username>

TODO

  • Install SLURM
  • GPU sharing
Clone this wiki locally