Skip to content

Cadarn/IntroToBigData2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Big Data & Data Engineering Workshop 2023

Materials for the the Big Data Workshop for DIScNet: 3-5 April 2023.

This will be a 3 day hands-on introduction to the technologies and ideas that are usied in building data applications at scale. The content will be delivered virtually via Zoom for the duration of the course

There are different components to the course including:

  • Lectures
  • Guided lab exercises
  • Supported workshop sessions

During the course we will use Docker to support testing technologies on a local laptop/desktop; some elements of the course will be completed in the cloud using the DataBricks community edition.

For a lot of the course a recent install of Anaconda will be required and an environment with Python >= 3.9 and Jupyter Lab >3.0 available.

Topics of the course

The course will be delivered in a dynamic and interactive way. There is a lot of core material but there are more technologies than there is time to explore and so the students will have the opportunity to suggest what elements are focussed on in some of the later sessions. Some of the topics we will be exploring include:

  • Python, Jupyter Lab & Pandas
  • Apache Spark
  • SQL
  • NoSQL
  • Docker and containerisation
  • Data streaming
  • Interactive dashboarding
  • Workflow orchestration

Practical labs

The course is very focussed around "doing" and "playing" with the tools. To that end there are lots of practical lab componets. These exist in the practical-labs folder. Exercises are in numerical order that should be in sync with the lecture order. Some of the code is available with gaps for you to complete and in those cases a second copy of the code with the solution is also included.

Source materials

To support the dynamic nature of the course some of the content will be live coded and will be uploaded to the repository after the session.

Other elements will collect some of the materials developed by Paul Freemantle (2016-2017) and Julie Weeds (2019-2020) previous instructors for this course.

Their original course materials are available here - we will use some of these materials as appropriate.

About

Materials for the DISCnet Introduction to Big Data Workshop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages