This is the material used for the course "Scalable analytics with Python (DASK)" given by Science IT Support at Bern University. You can test the notebooks using the binder badge, but several datasets are not included due to their size.
Installation is based on conda. If you don't have conda installed yet, the simplest is to install it using miniconda.
In order to install all necessary packages on your laptop or on a cluster, first clone this repository:
git clone https://github.com/guiwitz/DaskCourse.git
Then use this environment.yml file to create a dedicated conda environment:
conda env create -f environment.yml
If you want to use JupyterLab, also install the necessary extensions:
conda activate dask_course
jupyter labextension install dask-labextension --no-build
jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
jupyter labextension install @bokeh/jupyter_bokeh --no-build
jupyter lab build --minimize=False
jupyter serverextension enable dask_labextension
Finally you can download all necessary data using the script download_all.py:
conda activate dask_course
python download_data.py