📝 💻 🐍 👩 Unit Testing for Data Science

This repository contains workshop materials for "Unit Testing for Data Science" organized by Berlin Women in Machine Learning and Data Science on 1 July 2020.

Authors: Ellen König, Tereza Iofciu, Marielle Dado

🔧 ⚙️ 💻 Before the workshop: Setup your computer

⚠️ You must run the entire set-up BEFORE the workshop. The following instructions need to be run in a terminal (works best with bash shell).

Step 1: Requirements

There are three ways to setup your machine for the workshop, you should use the one that you are most familiar with. One is via conda, another via pyenv and the other with virtual env and pip.

conda

OR

pyenv >= 1.2.13
python >= 3.7.0
poetry >= 1.0.0 (to be installed in your local pyenv)

OR

venv
python >= 3.7.0

Step 2: Clone the workshop repository

Clone the repo locally:

git clone https://github.com/wimlds/berlin-tdd-workshop
cd berlin-tdd-workshop

Step 3: Setup and activate your virtual environment

Choose your preferred setup:

with conda:

# install working environment with conda
conda env create -n berlin-tdd-workshop -f environment.yml

# environment should be activated now
# if not type: source activate berlin-tdd-workshop

If you're getting errors, updating conda might help, run: conda update -n base -c defaults conda

With pyenv and poetry:

pyenv local 3.7.4
pip install poetry
make setup                          #if you use a different python version update that in pyproject.toml
source .venv/bin/activate           #to activate the environment

With venv:

python3 -m venv .wimlds_venv
source .wimlds_venv/bin/activate
pip install -r requirements.txt
source .venv/bin/activate           #to activate the environment

Step 4: Run test

Start environment and run this command in your terminal:

pytest or pytest -vv for verbose output

This should return 2 tests that passed and one test that was skipped (see image below).

ℹ️ About the workshop

❓ What are we expected to do at the workshop?

In this workshop, there are 3 exercises that involve one or both of the following tasks:

Writing functions for data imputation and data transformation.
Writing unit tests to test the functions you have written.

The instructors and mentors will guide you through the workshop content and materials

❓ Which files are we working on? Should we use Jupyter notebooks or an IDE?

We prepared the following files and folders for you to work on:

EDA - a notebook for exploring the dataset
scripts folder - contains the Python functions in following script files:
- scripts/imputation.py - functions for data imputation
- scripts/transformation.py - functions for data transformation
test folder - contains the unit tests following scripts:
- test/test_imputation.py - unit tests for functions in scripts/imputation.py
- test/test_transformation.py - unit tests for functions in scripts/transformation.py

In order to run successfully the unit tests you create, the functions and unit tests must be placed in the Python scripts.

That means that the functions you write for data imputation should be placed in scripts/imputation.py and the corresponding unit tests should be in test/test_imputation.py. For this, you will work on the scripts using an IDE of your choice.

However, you may also first write your functions/tests in EDA using Jupyter notebooks and then transfer them in the appropriate Python scripts later.

❓ What dataset are we using?

We prepared some data for the workshop using a Pockets dataset from the-pudding. In order to try out different imputation methods we removed 10% of the price values. If you want to take a look at all the data check the data/PrepareWorkshopData notebook. We will not cover this through the workshop.

❓ Where can we ask for help?

The instuctors and mentors can answer your questions on the #meetup-berlin-workshop-help channel on the WiMLDS Global Slack workspace.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
images		images
scripts		scripts
test		test
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
workshop.md		workshop.md

wimlds/berlin-tdd-workshop

Folders and files

Latest commit

History

Repository files navigation

📝 💻 🐍 👩 Unit Testing for Data Science

🔧 ⚙️ 💻 Before the workshop: Setup your computer

Step 1: Requirements

Step 2: Clone the workshop repository

Step 3: Setup and activate your virtual environment

Step 4: Run test

ℹ️ About the workshop

❓ What are we expected to do at the workshop?

❓ Which files are we working on? Should we use Jupyter notebooks or an IDE?

❓ What dataset are we using?

❓ Where can we ask for help?

About

Topics

Resources

Stars

Watchers

Forks

Languages