Skip to content

Scripts used during the creation of the Global River Water Quality Archive (GRQA)

License

Notifications You must be signed in to change notification settings

LandscapeGeoinformatics/GRQA_src

Repository files navigation

GRQA_src

DOI

Scripts used during the creation of the Global River Water Quality Archive (GRQA).

The dataset can be downloaded at https://zenodo.org/record/5101057

The data description paper is available at https://essd.copernicus.org/articles/13/5483/2021/

The scripts are divided into two folders. Folder preprocessing contains scripts used for preprocessing raw source data into a common structure used for GRQA. Folder grqa_processing contains scripts used for processing the merged data, generating plots and statistics.

preprocessing contains the following scripts:

  • *_download used for downloading source data
  • *_units for collecting water quality parameter units when multiple units per parameter were present in source data
  • *_preprocessing for source data cleaning and parameter harmonization to convert into a common structure used in GRQA
  • WQP_merge_stats for merging WQP time series statistics

grqa_preprocessing contains the following scripts:

  • *_param_codes for creating a list of GRQA parameters used as an input for the parallel implementation of *_obs_merging
  • *_obs_merging used for merging harmonized source data, calculating time series statistics per site (outliers, monthly availability, continuity) and flagging potential duplicate observations
  • *_param_stats for calculating GRQA time series statistics per parameter
  • *_plot_sites for creating maps of observation site distribution, monthly availablity, monthly continuity and median value per parameter
  • *_plot_hist for creating temporal distribution plots, histograms and box plots per parameter
  • *_plot_sites_grid for creating maps of observation site distribution, monthly availablity, monthly continuity and median value of DO, DOC, TP and TSS for the paper
  • *_plot_hist_grid for creating temporal distribution plots, histograms and box plots of DO, DOC, TP and TSS for the paper

Each Python script has a corresponding shell script that was used for submitting Slurm jobs to the HPC cluster of University of Tartu.