Skip to content

lassebenni/lab-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DataBricks DeltaLake workshop 21-04-09

Hands-on workshop on using Spark + DeltaLake running in MyBinder.

This repo is forked from https://github.com/thedatasociety/lab-spark and adjusted to install Spark 3.0.2 to support DeltaLake.

Steps

  1. Launch the Binder environment. This will start a Dockerized version of the Repo on a public compute instance (hosted by gesis.org).
  2. Binder will build and install Apache Hadoop 3.2.1 and Apache Spark 3.0.
  3. Binder will start Jupyter Lab so we can use Jupyter Notebooks for the exercises.
  4. The workshop notebooks will be downloaded from The Delta Lake Workshop Repo. The notebooks contain the code to install and use DataBricks DeltaLake.
  5. Enjoy the power of Spark and DeltaLake in your Browser!

Warning: Download your Notebook changes

Be sure to download any changes to your notebooks since the Binder environment is temporary and does not persist changes after shutting down! If your browser session is inactive for more than 10 minutes (active window==active session), Binder will shutdown and you will lose all changes on the next launch since a copy of the master branch will be started again.

Uses

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 88.8%
  • Dockerfile 11.2%