Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve predefined catalog implementation #844

Closed
DirkEilander opened this issue Mar 13, 2024 · 4 comments · Fixed by #849 or #921
Closed

improve predefined catalog implementation #844

DirkEilander opened this issue Mar 13, 2024 · 4 comments · Fixed by #849 or #921
Assignees
Labels
Enhancement New feature or request

Comments

@DirkEilander
Copy link
Contributor

DirkEilander commented Mar 13, 2024

Current implementation

Currently, the data/predefined_catalogs.yml file describes which predefined catalogs are available and where to find these. The DataCatalog.set_predefined_catalogs() methods reads this file from the main branch to set the DataCatalogs.predefined_catalogs property. If the file cannot be accessed, an error is raised. Data catalog files themselves are stored in data/catalogs and version is done based on git revision hashes, the latest version is always assumed to be in the main branch.

There are a few issues with this implementation:

  • without internet access the DataCatalog does not work
  • we cannot test predefined_catalog.yml file properly as the code always looks at the version at the main branch
  • we cannot (easily) fix bugs in old data catalog versions
  • if we change the predefined_catalog.yml format all previous hydromt version may break

Enhancement Description

How I would like to see this functioning:

  • data catalogs need their own semantic versioning scheme for users to be able to use older versions for reproducibility.
  • the possibility to publish bug fixes to current and older version with a patch release.
  • DataCatalog should always initialize (and not brake because an online file is not found)
  • the possibility to add predefined catalogs by plugins (new)

Possible implementation:

  • catalog are published on a separate branch (e.g. like github-pages) in a fixed scheme (e.g. "/<name>/<version>/data_catalog.yml"). This allows for updating older versions.
  • an overview of predefined catalogs and versions is contained in the codebase (e.g. in a new PredefinedCatalogs class) which is initialized with catalogs exposed by core and plugins via entrypoints

Additional Context

This is also discussed in #737

@DirkEilander DirkEilander added Enhancement New feature or request Needs refinement issue still needs refinement labels Mar 13, 2024
@DirkEilander
Copy link
Contributor Author

@savente93 @Jaapel @Tjalling-dejong @deltamarnix
I wrote this issue as starting point for our discussion tomorrow. It would be great if you could have a quick look beforehand.

@savente93
Copy link
Contributor

just as a primer for our discussion: a common way to do this is to make protected branches for each released version, as it means that we can supply bug fixes independently.

@DirkEilander
Copy link
Contributor Author

Outcome of discussion

  • use semantic version for format version (major); breaking changes in the catalog such as new data version / rename (minor); bug fixes (patch). Using a catalog format version (instead of hydromt_version compatibility) makes it easier to maintain catalogs.
  • save the data catalog files in a fixed scheme on the main branch, e.g. "/<name>/<version>/data_catalog.yml". These files are basically not editable, for each version we make a new file.
  • There is one root catalog file (now the predefined_catalogs.yml file). That contains a list of all available versions per per predefined data catalog. This file is editable: we add each new data catalog version to the list.
  • This uri to the root catalog file is supplied as an entrypoint such that plugins can also define their catalog overview files.
  • We also version this root catalog file to allow for possible future changes. Old HydroMT versions will than continue to work as these still have entrypoints to the old version.
  • We make sure that if the root catalog file is not found a warning (instead of an error) is given make hydromt less dependant on a single remote file
  • For testing purposes we read the overview file and catalogs directly from the repos within the same branch to make sure we test the files in that branch (and not main).
  • Question: Do we implement this already in v9.x or v1? v1 might be more pragmatic (and realistic). However this situation is blocking further development of the catalogs.

@savente93 @Jaapel @deltamarnix Can you let me know if I missed anything?

@deltamarnix
Copy link
Contributor

I was thinking to maintain backwards compatibility with the current HydroMT versions that are out there in the world, we could keep predefined_catalogs.yml for now and all the corresponding catalogs. And that is known as v0. We build a copy of all the files next to them and call that predefined_catalogs.v1.yaml. That should keep all old HydroMT versions working for now, as they are still dependent on the v0 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants