Introduction to Big Data & Data Engineering Workshop 2023

Materials for the the Big Data Workshop for DIScNet: 3-5 April 2023.

This will be a 3 day hands-on introduction to the technologies and ideas that are usied in building data applications at scale. The content will be delivered virtually via Zoom for the duration of the course

There are different components to the course including:

Lectures
Guided lab exercises
Supported workshop sessions

During the course we will use Docker to support testing technologies on a local laptop/desktop; some elements of the course will be completed in the cloud using the DataBricks community edition.

For a lot of the course a recent install of Anaconda will be required and an environment with Python >= 3.9 and Jupyter Lab >3.0 available.

Topics of the course

The course will be delivered in a dynamic and interactive way. There is a lot of core material but there are more technologies than there is time to explore and so the students will have the opportunity to suggest what elements are focussed on in some of the later sessions. Some of the topics we will be exploring include:

Python, Jupyter Lab & Pandas
Apache Spark
SQL
NoSQL
Docker and containerisation
Data streaming
Interactive dashboarding
Workflow orchestration

Practical labs

The course is very focussed around "doing" and "playing" with the tools. To that end there are lots of practical lab componets. These exist in the practical-labs folder. Exercises are in numerical order that should be in sync with the lecture order. Some of the code is available with gaps for you to complete and in those cases a second copy of the code with the solution is also included.

Source materials

To support the dynamic nature of the course some of the content will be live coded and will be uploaded to the repository after the session.

Other elements will collect some of the materials developed by Paul Freemantle (2016-2017) and Julie Weeds (2019-2020) previous instructors for this course.

Their original course materials are available here - we will use some of these materials as appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
archive		archive
datafiles		datafiles
practical-labs		practical-labs
preparatory_materials		preparatory_materials
presentations		presentations
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive

archive

datafiles

datafiles

practical-labs

practical-labs

preparatory_materials

preparatory_materials

presentations

presentations

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Introduction to Big Data & Data Engineering Workshop 2023

Topics of the course

Practical labs

Source materials

About

Releases

Packages

Languages

Cadarn/IntroToBigData2023

Folders and files

Latest commit

History

Repository files navigation

Introduction to Big Data & Data Engineering Workshop 2023

Topics of the course

Practical labs

Source materials

About

Resources

Stars

Watchers

Forks

Languages