A sample project reading from TigerGraph with pySpark
- Install Spark and Scala (
brew install apache-spark && brew install scala
) - Clone this project and enter the directory
- Create a Python virtual environment (
python3 -m venv venv
) and enter the environment (source venv/bin/activate
) - Install pySpark and pyPandoc (
pip3 install pyspark pypandoc
) - Load an on-premise TigerGraph AMLSim graph
- Download the lastest
.jar
file of the JDBC TigerGraph Driver - Run the project (
spark-submit --jars tigergraph-jdbc-driver-1.3.0.jar index.py
)
This repository will walk you through how to get TigerGraph data using pySpark. It shows three possible methods to do so: retrieving vertices, retrieving edges, and running queries.
Find a thorough walkthrough of this project (set up, code explanation, etc.) here.