Mellanox OFED GPUDirect RDMA

The latest advancement in GPU-GPU communications is GPUDirect RDMA. This new technology provides a direct P2P (Peer-to-Peer) data path between the GPU Memory directly to/from the Mellanox HCA devices. This provides a significant decrease in GPU-GPU communication latency and completely offloads the CPU, removing it from all GPU-GPU communications across the network.

Mellanox Product Family

General

MLNX_OFED 2.1 introduces an API between IB CORE to peer memory clients, such as NVIDIA Kepler class GPU's, (e.g. GPU cards), also known as GPUDirect RDMA. It provides access for the HCA to read/write peer memory data buffers, as a result it allows RDMA-based applications to use the peer device computing power with the RDMA interconnect without the need for copying data to host memory.

This capability is supported with Mellanox ConnectX-3 VPI or Connect-IB InfiniBand adapters. It will also work seemlessly using RoCE technology with the Mellanox ConnectX-3 VPI adapters.

This README describes the required steps to completing the installation for the NVIDIA peer memory client with Mellanox OFED.

Installation

Pre-requisites:

NVIDIA compatible driver is installed and up.
MLNX_OFED 5.1 or newer is installed and up.

Please note that to build correctly, a MLNX_OFED carrying the Peer-direct fix for the bug "Peer-direct patch may cause deadlock due to lock inversion" (tracked by the Internal Ref. #2696789) is required, for example MLNX_OFED 5.3-1.0.0.1.43.

For the required NVIDIA driver and other relevant details in that area please check with NVIDIA support.

To build source packages (src.rpm for RPM based OS and tarball for DEB based OS), use the build_module.sh script.

For example, to build on RPM based OS:

$ ./build_module.sh
Building source rpm for nvidia_peer_memory...

Built: /tmp/nvidia_peer_memory-1.2-0.src.rpm

To install run on RPM based OS:
# rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm
# rpm -ivh <path to generated binary rpm file>

To build on DEB based OS:

Building debian tarball for nvidia-peer-memory...

Built: /tmp/nvidia-peer-memory_1.2.orig.tar.gz

To install on DEB based OS:
# cd /tmp
# tar xzf /tmp/nvidia-peer-memory_1.2.orig.tar.gz
# cd nvidia-peer-memory-1.2
# dpkg-buildpackage -us -uc
# dpkg -i <path to generated deb files>

To install run (excluding Ubuntu):

rpmbuild --rebuild <path to srpm>.
rpm -ivh <path to generated binary rpm file.> [On SLES add --nodeps].

To install on Ubuntu run:

dpkg-buildpackage -us -uc
dpkg -i <path to generated deb files.>

(e.g. dpkg -i nvidia-peer-memory_1.2-0_all.deb
      dpkg -i nvidia-peer-memory-dkms_1.2-0_all.deb)

After successful installation:

nv_peer_mem.ko is installed
service file /etc/init.d/nv_peer_mem to be used for start/stop/status for that kernel module was added.
/etc/infiniband/nv_peer_mem.conf to control whether kernel module will be loaded on boot (default is YES) was added.

Notes

To achieve good performance both the NIC and the GPU must physically sit on same i/o root complex, use lspci -tv to make sure that this is the case.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
debian		debian
Makefile		Makefile
README.md		README.md
build_module.sh		build_module.sh
build_release.sh		build_release.sh
compat_nv-p2p.h		compat_nv-p2p.h
create_nv.symvers.sh		create_nv.symvers.sh
dkms.conf		dkms.conf
nv_peer_mem		nv_peer_mem
nv_peer_mem.c		nv_peer_mem.c
nv_peer_mem.conf		nv_peer_mem.conf
nv_peer_mem.upstart		nv_peer_mem.upstart
nvidia_peer_memory.spec		nvidia_peer_memory.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

debian

debian

Makefile

Makefile

README.md

README.md

build_module.sh

build_module.sh

build_release.sh

build_release.sh

compat_nv-p2p.h

compat_nv-p2p.h

create_nv.symvers.sh

create_nv.symvers.sh

dkms.conf

dkms.conf

nv_peer_mem

nv_peer_mem

nv_peer_mem.c

nv_peer_mem.c

nv_peer_mem.conf

nv_peer_mem.conf

nv_peer_mem.upstart

nv_peer_mem.upstart

nvidia_peer_memory.spec

nvidia_peer_memory.spec

Repository files navigation

Mellanox OFED GPUDirect RDMA

General

Installation

Notes

About

Releases

Packages

Contributors 12

Languages

drossetti/nv_peer_memory

Folders and files

Latest commit

History

Repository files navigation

Mellanox OFED GPUDirect RDMA

General

Installation

Notes

About

Resources

Stars

Watchers

Forks

Languages