Skip to content

digifarm-ms/giessdenkiez-de-dwd-harvester

 
 

Repository files navigation

All Contributors

giessdenkiez-de-dwd-harvester

  • Gather precipitation data from DWD's radolan data set, for the region of Berlin and connect to the giessdenkiez.de postgres DB
  • Uploads trees combined with weather data to Mapbox and uses its API to create vector tiles for use on mobile devices
  • Generates CSV and GeoJSON files that contain trees locations and weather data (grid) and uploads them to a Supabase Storage bucket

Pre-Install

I am using venv to setup a virtual python environment for separating dependencies:

python -m venv REPO_DIRECTORY

Install

pip install -r requirements.txt

I had some trouble installing psycopg2 on MacOS, there is a problem with the ssl-lib linking. Following install resolved the issue:

env LDFLAGS='-L/usr/local/lib -L/usr/local/opt/openssl/lib -L/usr/local/opt/readline/lib' pip install psycopg2

GDAL

As some of python's gdal bindings are not as good as the command line tool, i had to use the original. Therefore, gdal needs to be installed. GDAL is a dependency in requirements.txt, but sometimes this does not work. Then GDAL needs to be installed manually. Afterwards, make sure the command line calls for gdalwarp and gdal_polygonize.py are working.

Linux

Here is a good explanation on how to install gdal on linux: https://mothergeo-py.readthedocs.io/en/latest/development/how-to/gdal-ubuntu-pkg.html

Mac

For mac we can use brew install gdal.

The current python binding of gdal is fixed to GDAL==2.4.2. If you get another gdal (ogrinfo --version), make sure to upgrade to your version: pip install GDAL==VERSION_FROM_PREVIOUS_COMMAND

Add a geometry (shapefile) for your region

In order to run, this project needs a shapefile for the region you would like to cover. The default region is the City of Berlin, for which a shapefile is included in the repository. You can add your own shapefile to harvester/assets/.

Configuration

Copy the sample.env file and rename to .env then update the parameters, most importantly the database connection parameters and the path to your shapefile.

Running

Preparing the Buffer Shape

harvester/prepare.py shows how the assets/buffer.shp was created. If a bigger buffer is needed, change the INPUT_SHAPEFILE_BUFFER parameter accordingly and re-run.

Creating the Grid Structure

harvester/grid/grid.py can be used to populate the radolan_geometry table. This table contains vector data for the target city. The data is needed by the harvest process to find the rain data for the target city area.

This tool currently works for Berlin. To make use of it for another city, just replace the harvester/grid/buffer.shp file with a suitable shape. (can be generated by harvester/prepare.py for example. See above)

Running the Harvest Process

harvester/harvester.py is the actual file for harvesting the data. Simply run, no command line parameters, all settings are in .env.

The code in harvester/harvester.py tries to clean up after running the code. But, when running this in a container, as the script is completely stand alone, its probably best to just destroy the whole thing and start from scratch next time.

Docker

To have a local database for testing you need Docker and docker-compose installed. You will also have to create a public Supabase Storage bucket. You also need to update the .env file with the values from sample.env below the line # for your docker environment.

to start only the database run

docker-compose -f  docker-compose.postgres.yml up

This will setup a postgres/postgis DB and provision the needed tables and insert some test data.

To run the harvester and the postgres db run

docker-compose up

Known Problems

harvester.py throws Error on first run

When running the setup for the first time docker-compose up the provisioning of the database is slower then the execution of the harvester container. You will have to stop the setup and run it again to get the desired results.

Postgres Provisioning

The provisioning sql script is only run once when the container is created. When you create changes you will have to run:

docker-compose down
docker-compose up --build

Terraform

Terrafrom is used to create the needed S3 Bucket, the Postres RDS and the Fargate container service. Install and configure Terraform. Update terraform.tfvars with your profile and region.

Run:

# once
# cd into the directories
# create them in this order
# 1. s3-bucket
# 2. rds
# 3. ecs-harvester
# the last setup needs some variables from you
# - vpc
# - public subnet ids
# - profile
# - and all the env variables for the container
terraform init
# and after changes
terraform apply

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Fabian Morón Zirfas
Fabian Morón Zirfas

💻 📖
Sebastian Meier
Sebastian Meier

💻 📖
Dennis Ostendorf
Dennis Ostendorf

💻
Lisa-Stubert
Lisa-Stubert

💻
Lucas Vogel
Lucas Vogel

📖
Jens Winter-Hübenthal
Jens Winter-Hübenthal

💻 🐛
Simon Jockers
Simon Jockers

🚇 💻 🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

Credits



A project by:

Supported by:

About

Gather precipitation data from DWD's radolan data set, for the City of Münster and connect to the trees DB

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 48.4%
  • HCL 32.1%
  • PLpgSQL 11.2%
  • Dockerfile 4.8%
  • Shell 2.8%
  • JavaScript 0.7%