Hands-on workshop on using Spark + DeltaLake running in MyBinder.
This repo is forked from https://github.com/thedatasociety/lab-spark and adjusted to install Spark 3.0.2 to support DeltaLake.
- Launch the Binder environment. This will start a Dockerized version of the Repo on a public compute instance (hosted by gesis.org).
- Binder will build and install Apache Hadoop 3.2.1 and Apache Spark 3.0.
- Binder will start Jupyter Lab so we can use Jupyter Notebooks for the exercises.
- The workshop notebooks will be downloaded from The Delta Lake Workshop Repo. The notebooks contain the code to install and use DataBricks DeltaLake.
- Enjoy the power of Spark and DeltaLake in your Browser!
Be sure to download any changes to your notebooks since the Binder environment is temporary and does not persist changes after shutting down! If your browser session is inactive for more than 10 minutes (active window==active session), Binder will shutdown and you will lose all changes on the next launch since a copy of the master branch will be started again.
- https://mybinder.org/ (Dockerizing the Repo)
- https://github.com/delta-io/delta (DeltaLake)
- https://github.com/thedatasociety/lab-spark (upstream for this repo)
- https://github.com/lassebenni/delta-lake-workshop (notebooks with exercises for deltalake)