Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate ckan_harvester with ckanext-scheming #153

Open
florianm opened this issue Aug 23, 2015 · 3 comments
Open

Integrate ckan_harvester with ckanext-scheming #153

florianm opened this issue Aug 23, 2015 · 3 comments

Comments

@florianm
Copy link
Contributor

The default ckan_harvester will run into trouble if the harvesting ckan has a custom ckanext-scheming schema. The incompatibility lies with the handling of extra fields: Scheming uses extra fields to store its custom fields. The ckan_harvester on the other hand creates/overrides extra fields as found on the harvested instance. However, scheming's package_read templates will fail if any non-defined extra fields are present.

Pinging @amercader and @wardi for advice:

Assuming an unknown schema A (default, hard-coded like data.gov.au, or ckanext-scheming) is harvested into a custom ckanext-scheming schema B using the ckan_harvester, there are two overlapping sets of fields present:

  • fields in A and in B
  • fields in A not in B
  • fields in B not in A

Would it make sense to modify the ckan_harvester's behaviour around extra fields as follows:

  • parse ckan config
  • if no ckanext.scheming custom schemas are set, continue as normal, otherwise:
  • if ckanext.scheming custom schemas are present, use (first?) schema as "B"
  • iterate over present keys in schema B
  • fields identical in A and B: direct transfer
  • fields in A not in B: suggestion: append to B.notes?
  • fields in B not in A: if optional, leave blank. If mandatory, suggestion: set dummy value and append warning to B.notes?
@wardi
Copy link

wardi commented Aug 23, 2015

It should be possible to add this sort of logic to a custom harvester, right?

Also, sites using ckanext-scheming will advertise the schemas they have installed through actions scheming_dataset_schema_list and scheming_dataset_schema_show so it's possible to query the schema in use on both ends instead of checking the config.

@florianm
Copy link
Contributor Author

thanks, I'll try my luck on a custom harvester using dataset_schema_list and _show!

@florianm
Copy link
Contributor Author

Still running into #151 and no idea how to debug that one... ckanapi looking better and better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants