Scripts used during the creation of the Global River Water Quality Archive (GRQA).
The dataset can be downloaded at https://zenodo.org/record/5101057
The data description paper is available at https://essd.copernicus.org/articles/13/5483/2021/
The scripts are divided into two folders. Folder preprocessing contains scripts used for preprocessing raw source data into a common structure used for GRQA. Folder grqa_processing contains scripts used for processing the merged data, generating plots and statistics.
preprocessing contains the following scripts:
- *_download used for downloading source data
- *_units for collecting water quality parameter units when multiple units per parameter were present in source data
- *_preprocessing for source data cleaning and parameter harmonization to convert into a common structure used in GRQA
- WQP_merge_stats for merging WQP time series statistics
grqa_preprocessing contains the following scripts:
- *_param_codes for creating a list of GRQA parameters used as an input for the parallel implementation of *_obs_merging
- *_obs_merging used for merging harmonized source data, calculating time series statistics per site (outliers, monthly availability, continuity) and flagging potential duplicate observations
- *_param_stats for calculating GRQA time series statistics per parameter
- *_plot_sites for creating maps of observation site distribution, monthly availablity, monthly continuity and median value per parameter
- *_plot_hist for creating temporal distribution plots, histograms and box plots per parameter
- *_plot_sites_grid for creating maps of observation site distribution, monthly availablity, monthly continuity and median value of DO, DOC, TP and TSS for the paper
- *_plot_hist_grid for creating temporal distribution plots, histograms and box plots of DO, DOC, TP and TSS for the paper
Each Python script has a corresponding shell script that was used for submitting Slurm jobs to the HPC cluster of University of Tartu.