Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets (90 min version)

"Big data" refers to any data that is too large to handle comfortably with your current tools and infrastructure. As the leading language for data science, Python has many mature options that allow you to work with datasets that are orders of magnitudes larger than what can fit into a typical laptop's memory. In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud -- starting from how the data is stored and read, to how it is processed and visualized.

By the end, you will be able to answer:

What makes some data formats more efficient at scale?
Why, how, and when (and when not) to leverage parallel and distributed computation (primarily with Dask) for your work?
How to manage cloud storage, resources, and costs effectively?
How interactive visualization can make large and complex data more understandable (primarily with hvPlot)?
How to comfortably collaborate on data science projects with your entire team?

Setup

This tutorial is designed to run in the cloud.

Live presentations will run on a Nebari (JupterHub) instance, check out the introduction notebook for details.

Live presentations

PyDat Global 2023 (Upcoming!)
PyData NYC 2023

You can check out the tags for previous versions of this tutorial.

This repository is covered by the Nebari Code of Conduct, and is under BSD 3-Clause license.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
environment		environment
images		images
prep		prep
.gitignore		.gitignore
00-introduction.ipynb		00-introduction.ipynb
01-intro-to-dask.ipynb		01-intro-to-dask.ipynb
02-big-data-analysis-with-dask.ipynb		02-big-data-analysis-with-dask.ipynb
03-big-data-visualization.ipynb		03-big-data-visualization.ipynb
04-conclusion.ipynb		04-conclusion.ipynb
README.md		README.md
appendix-a-collaborative-data-science.ipynb		appendix-a-collaborative-data-science.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

environment

environment

images

images

prep

prep

.gitignore

.gitignore

00-introduction.ipynb

00-introduction.ipynb

01-intro-to-dask.ipynb

01-intro-to-dask.ipynb

02-big-data-analysis-with-dask.ipynb

02-big-data-analysis-with-dask.ipynb

03-big-data-visualization.ipynb

03-big-data-visualization.ipynb

04-conclusion.ipynb

04-conclusion.ipynb

README.md

README.md

appendix-a-collaborative-data-science.ipynb

appendix-a-collaborative-data-science.ipynb

Repository files navigation

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets (90 min version)

Setup

Live presentations

About

Releases

Packages

Languages

nebari-dev/big-data-tutorial-90min

Folders and files

Latest commit

History

Repository files navigation

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets (90 min version)

Setup

Live presentations

About

Resources

Code of conduct

Stars

Watchers

Forks

Languages