Skip to content

cedrickchee/pygdf

 
 

Repository files navigation

PyGDF

Build Status  Documentation Status

PyGDF implements the Python interface to access and manipulate the GPU DataFrame of GPU Open Analytics Initiative (GoAi). We aim to provide a simple interface that is similar to the Pandas DataFrame and hide the details of GPU programming.

Read more about GoAi and the GDF

Setup

Conda

You can get a minimal conda installation with Miniconda or get the full installation with Anaconda.

You can install and update PyGDF using the conda command:

conda install -c numba -c conda-forge -c gpuopenanalytics/label/dev -c defaults pygdf=0.1.0a3

You can create and activate a development environment using the conda command:

conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
source activate pygdf_dev

Install from Source

To install PyGDF from source, clone the repository and run the python install command:

git clone https://github.com/gpuopenanalytics/pygdf.git
python setup.py install

Note: This assumes dependencies including libgdf are already installed, so it is recommended to use the conda environment.

A Dockerfile is provided for building and installing LibGDF and PyGDF from their respective master branches.

Notes:

  • We test with and recommended installing nvidia-docker2
  • Host's installed nvidia driver must support >= the specified CUDA version (9.2 by default).
  • Alternative CUDA_VERSION should be specified via Docker build-arg
  • Alternate branches for libgdf and pygdf may be specified as Docker build-args LIBGDF_REPO and PYGDF_REPO. See Dockerfile for example.
  • Ubuntu 16.04 is the default OS for this container. Alternate OSes may be specified as Docker build-arg LINUX_VERSION. See list of available images.
  • Python 3.6 is default, but other versions may be specified via PYTHON_VERSION build-arg
  • GCC & G++ 5.x are default compiler versions, but other versions (which are supplied by the OS package manager) may be specified via CC and CXX build-args respectively
  • numba (0.40.0), numpy (1.14.3), and pandas (0.20.3) versions are also configurable as build-args

From pygdf project root, to build with defaults:

docker build -t pygdf .
...
 ---> ec65aaa3d4b1
 Successfully built ec65aaa3d4b1
 Successfully tagged pygdf:latest

docker run --runtime=nvidia -it pygdf bash
/# source activate gdf
(gdf) root@3f689ba9c842:/# python -c "import pygdf"
(gdf) root@3f689ba9c842:/# 

Pip

Currently, we don't support pip install yet. Please use conda for the time being.

Testing

This project uses py.test.

In the source root directory and with the development environment activated, run:

py.test

Getting Started

Please see the Demo Docker Repository for example notebooks on how you can utilize the GPU DataFrame.

GPU Open Analytics Initiative

The GPU Open Analytics Initiative (GoAi) seeks to foster and develop open collaboration between GPU analytics projects and products to enable data scientists to efficiently combine the best tools for their workflows. The first project of GoAi is the GPU DataFrame (GDF), which enables tabular data to be directly exchanged between libraries and applications on the GPU.

GPU DataFrame

The GPU DataFrame is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. The GPU DataFrame uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Arrow are supported.

Packages

No packages published

Languages

  • Jupyter Notebook 61.5%
  • Python 38.2%
  • Other 0.3%