Spark Training Repository

This repository contains many different examples, exercises and tutorials for Spark and Hadoop trainings performed by dimajix. You can always find the latest version on GitHub at

https://github.com/dimajix/spark-training

External Dependencies

Some notebooks require some test data provided by dimajix on S3 at s3://dimajix-training/data/.

Building Executables

The source code can be built using Maven, simply by running

mvn install

from the root directory.

Running Examples

Most code is either provided as interactive Notebooks (Jupyter and/or Zeppelin) or as compilable programs. Programs which create jar files always contain start scripts, which take care of setting any environment variables and Spark configuration properties.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Spark Training Repository

Contents

External Dependencies

Building Executables

Running Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

Spark Training Repository

Contents

External Dependencies

Building Executables

Running Examples