Skip to content

cylondata/cylon_tutorial

Repository files navigation

Instructions

TODO Fix the CMake file!

Data

Data files for this tutorial have been taken from the article, 'Merge and Join DataFrames with Pandas in Python' by Shane Lynn that refers to real data from the KillBiller application.

C++

  • Follow Cylon docs for detailed building instructions, but in summary,
./build.sh --cpp --release
  • Run demo_join.cpp example
./build/bin/demo_join
  • For distributed execution using MPI
mpirun -np <procs> ./build/bin/demo_join

Python

Build

  • Activate the python virtual environment
source <CYLON_HOME>/ENV/bin/activate 
  • Follow Cylon docs for detailed building instructions, but in summary,
./build.sh --pyenv <CYLON_HOME>/ENV --python --release
  • Export LD_LIBRARY_PATH
export LD_LIBRARY_PATH=<CYLON_HOME>/build/arrow/install/lib:<CYLON_HOME>/build/lib:$LD_LIBRARY_PATH

Sequential Join

  • Run demo_join.py script
python ./cpp/src/tutorial/demo_join.py

Distributed Join

  • For distributed execution using MPI
mpirun -np <procs> <CYLON_HOME>/ENV/bin/python ./cpp/src/tutorial/demo_join.py

Data Pre-Processing for Deep Learning with PyTorch

PyCylon pre-process the data starting from data loading and joining two tables to formulate the features required for the data analytic carried out in PyTorch. PyCylon pre-process the data and releases the data as an Numpy NdArray at the end of the pipeline.

Pre-requisites

  1. Install PyTorch pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
  • Run sequential demo_pytorch.py
python demo_pytorch.py
  • Run distributed demo_pytorch_distributed.py
mpirun -n <procs> <CYLON_HOME>/ENV/bin/python demo_pytorch_distributed.py

Note: procs must be set such that, 0 < procs < 5

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published