Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate dataset before create or update #312

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
4 changes: 4 additions & 0 deletions .travis.yml
Expand Up @@ -15,3 +15,7 @@ services:
install:
- bash bin/travis-build.bash
script: sh bin/travis-run.sh

# the new trusty images of Travis cause build errors with psycopg2, see https://github.com/travis-ci/travis-ci/issues/8897
dist: trusty
group: deprecated-2017Q4
4 changes: 4 additions & 0 deletions README.rst
Expand Up @@ -336,6 +336,10 @@ field. The currently supported configuration options are:
Setting this property to true will force the harvester to gather all remote
packages regardless of the modification date. Default is False.

* validate_packages: By default, packages are not validated against instance
schema. Setting this property will enable validation against schema before
creating local packages.

* remote_groups: By default, remote groups are ignored. Setting this property
enables the harvester to import the remote groups. There are two alternatives.
Setting it to 'only_local' will just import groups which name/id is already
Expand Down
30 changes: 29 additions & 1 deletion ckanext/harvest/harvesters/ckanharvester.py
Expand Up @@ -9,6 +9,7 @@
from ckan.lib.helpers import json
from ckan.lib.munge import munge_name
from ckan.plugins import toolkit
import ckan.lib.plugins as lib_plugins

from ckanext.harvest.model import HarvestObject

Expand Down Expand Up @@ -163,7 +164,7 @@ def validate_config(self, config):
except NotFound:
raise ValueError('User not found')

for key in ('read_only', 'force_all'):
for key in ('read_only', 'force_all', 'validate_packages'):
if key in config_obj:
if not isinstance(config_obj[key], bool):
raise ValueError('%s must be boolean' % key)
Expand Down Expand Up @@ -528,6 +529,33 @@ def get_extra(key, package_dict):
# key.
resource.pop('revision_id', None)


# validate packages if needed
validate_packages = self.config.get('validate_packages', {})
if validate_packages:
if 'type' not in package_dict:
package_plugin = lib_plugins.lookup_package_plugin()
try:
# use first type as default if user didn't provide type
package_type = package_plugin.package_types()[0]
except (AttributeError, IndexError):
package_type = 'dataset'
# in case a 'dataset' plugin was registered w/o fallback
package_plugin = lib_plugins.lookup_package_plugin(package_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that line be one level higher? Otherwise package_plugin is not set if there was no exception on line 540.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package_plugin is set on line 537 and most of that code is actually copy paste from here https://github.com/ckan/ckan/blob/master/ckan/logic/action/create.py#L136

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah perfect, thanks for the explanation!

package_dict['type'] = package_type
else:
package_plugin = lib_plugins.lookup_package_plugin(package_dict['type'])


schema = package_plugin.create_package_schema()


data, errors = lib_plugins.plugin_validate(
package_plugin, base_context, package_dict, schema, 'package_create')

if errors:
raise ValidationError(errors)

result = self._create_or_update_package(
package_dict, harvest_object, package_dict_form='package_show')

Expand Down