Skip to content

elsatch/PyData-Global-2023-Improving-Open-Data-Quality-using-Python

Repository files navigation

Improving Open Data Quality using Python

This repo has the complete materials of the tutorial session Improving Open Data Quality using Python, presented at PyData Global 2023 conference

Preparing the environment

First, we should create a python virtual environment and install the required dependencies. To do so, we can run the following commands:

python -m venv data-quality

Now depending on your OS, you should run the following command:

  • Linux/MacOS
source data-quality/bin/activate
  • Windows
data-quality\Scripts\Activate.ps1

Finally, we can install the required dependencies:

pip install -r requirements.txt

Running the environment in Google Colab

You can also launch the notebook in Google Colab performing the following steps:

  1. Open the Colab web site: https://colab.research.google.com/
  2. File menu -> Open notebook
  3. Click on the GitHub tab
  4. Paste the following URL: https://github.com/elsatch/yData-Global-2023-Improving-Open-Data-Quality-using-Python.git
  5. Select the single_datasets.ipynb notebook
  6. Execute the specific cell for colab at the beginning of the notebook

About

This repo has the complete materials of the tutorial session Improving Open Data Quality using Python, presented at PyData Global 2023 conference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published