Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling differences between metastore-lib backend and CKAN Metastore #54

Open
pdelboca opened this issue Aug 31, 2020 · 1 comment
Open

Comments

@pdelboca
Copy link
Collaborator

Nowadays there is no handling for possible discrepancies in the two backend we are using to store metadata. This causes CKAN to display a dataset (because it exist in Metastore) but fail when trying to edit it because it doesn't exist in metastore-lib backend (for example, data has not been migrated into github repositories).

Example traceback:

File '/usr/lib/ckan/src/ckan/ckan/logic/action/update.py', line 334 in package_update
  item.after_update(context, data)
File '/usr/local/lib/python2.7/dist-packages/ckanext/versioning/plugin.py', line 149 in after_update
  pkg_dict['name'], datapackage, author=author)
File '/usr/local/lib/python2.7/dist-packages/metastore/backend/github/storage.py', line 109 in update
  repo = self._get_repo(package_id)
File '/usr/local/lib/python2.7/dist-packages/metastore/backend/github/storage.py', line 226 in _get_repo
  raise exc.NotFound('Could not find package {}'.format(package_id))
NotFound: Could not find package testing-versions

This will also introduce a hard dependencies: we cannot change/update the metastore-lib backend without a data migration which is something that shouldn't happen but it is worth to have it in mind while we are in the development workflow.

Is this gonna be handle in a specific way?

Some scenarios I can think:

  • While doing data migration some repositories are not created and therefore there are datasets in CKAN that doesn't exist in the new backend.
  • Someone edits the git backend directly and now there are resources and metadata that no longer exists in CKAN metastore
  • Some process updates CKAN database directly without updating metastore-lib (EG a data migration script run directly in the database).
@shevron
Copy link
Contributor

shevron commented Sep 1, 2020

👍 I like the idea of fault tolerance and graceful degradation. I think migrations are much easier if systems allow for eventual consistency rather than expect to be consistent 100% of the time.

My thought initially was to ensure that if a dataset doesn't exist in the metastore we rely on CKAN data and "migrate" on demand either when the dataset is saved for the first time, or when it is read for the first time (but I think this is slightly less preferred).

I am not sure how to prioritize this, but I will look into the complexity of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants