Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: better document in the data catalogs if datasets were pre-processed #356

Closed
1 task done
hboisgon opened this issue May 23, 2023 · 3 comments · Fixed by #667
Closed
1 task done

DOC: better document in the data catalogs if datasets were pre-processed #356

hboisgon opened this issue May 23, 2023 · 3 comments · Fixed by #667
Assignees
Labels
Datasets request to update or add new datasets Documentation Improvements or additions to documentation
Milestone

Comments

@hboisgon
Copy link
Contributor

hboisgon commented May 23, 2023

HydroMT version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

In the docs here: https://deltares.github.io/hydromt/latest/user_guide/data_existing_cat.html
Maybe also here (meta section) depending on implementation: https://deltares.github.io/hydromt/latest/user_guide/data_prepare_cat.html

Documentation problem

Some of the datasets in the pre-defined catalogs are actually not original but pre-processed data (eg. modis_lai, merit_hydro for some of the layers etc).
Maybe we should find a standard way of letting the user know about this?
See also issue in hydromt-wflow:#157

Known issues:

  • merit_hydro
  • merit_hydro_patch
  • modis_lai
  • era5 other than hourly (daily, zarr)
  • chirps

Possibly related:

  • hydro_lakes
  • hydro_reservoirs

Suggested fix for documentation

I think so far we tried to use source_url and notes in meta to say if processing was done. For some data sources it's missing but I also wonder if this way is clear to the user or if we should do it differently ?

For example only add source_url if no processing was done.
In case of processing, use new keywords processing_from_url, processing_from_doi, processing_steps?

@hboisgon hboisgon added Documentation Improvements or additions to documentation DataCatalog & DataAdapters issues related to the DataCatalog and DataAdapters labels May 23, 2023
@hboisgon hboisgon changed the title DOC: better document in the data catalogs if datasets were pro-processed DOC: better document in the data catalogs if datasets were pre-processed Jun 1, 2023
@alimeshgi alimeshgi added this to the Q3 milestone Jun 21, 2023
@hboisgon hboisgon added Datasets request to update or add new datasets and removed DataCatalog & DataAdapters issues related to the DataCatalog and DataAdapters labels Jun 28, 2023
@DirkEilander
Copy link
Contributor

Part of the solution is found in updating the meta data section in deltares_data.yml & documentation according to:

  meta:
    source_url: zenodo.org/my_dataset # should point to processed data OR original in combi with processing_notes/script
    source_license: CC-BY-3.0
    source_version: vX.X
    paper_ref: Author et al. (year)
    paper_doi: doi
    processing_notes:  <description of process in script OR simple processing steps (e.g. filter / gdalbuildvrt)>
    processing_script: <url to script>
    category: category

It should be checked case by case what is required for reproducibility. there are several options:

  • publish pre-processed data together with the script on Zenodo (e.g. MODIS_LAI/ MERIT Hydro basins map) and point to this data in source_url
  • point to scripts in processing_script to download and/or process (e.g. ERA5)
  • add processing_notes for simple processing to filter data (e.g. hydro_lakes) or create a vrt (merit)
  • documentation of required data (e.g. bounds is required in the hydrographic region argument unless the basin map and index are present)
  • check used data sources in examples (e.g. replace merit_hydro with merit_hydro_ihu).

@DirkEilander
Copy link
Contributor

DirkEilander commented Jun 28, 2023

In this issue we add the processing_notes to the sources mentioned above. We will follow up in separate issues (#537 ) on the next step

@DirkEilander
Copy link
Contributor

FYI: This issue is split into #537 (to identify and make notes on datasets with preprocessing) and more (to be created) issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datasets request to update or add new datasets Documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants