Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a problem with data-dictization of lists to fix search-index rebuild #529

Merged
merged 3 commits into from May 23, 2023

Conversation

seitenbau-govdata
Copy link
Member

@seitenbau-govdata seitenbau-govdata commented May 4, 2023

Fixes #528.

The update of sqlalchemy from 1.3 to 1.4 caused a problem with the dictization of the data returned by the db.

https://github.com/ckan/ckanext-harvest/blob/master/ckanext/harvest/logic/dictization.py#L79 returns a list of db entries containing the errors of the harvest job. With the update of sqlalchemy the the return type of the list elements has changed from sqlalchemy.util._collections.result to sqlalchemy.engine.row.Row. The new type cannot be reliably dictized the way we did it so far. Therefore we made sure the elements are dictized.

@seitenbau-govdata seitenbau-govdata marked this pull request as ready for review May 4, 2023 08:50
@seitenbau-govdata
Copy link
Member Author

Could somebody review if this is a viable solution? Maybe @amercader or @pdelboca ? 😃

@pdelboca
Copy link
Member

pdelboca commented May 4, 2023

Hello @seitenbau-govdata ! Thanks for this contribution!

A couple of thoughts/questions:

  • Can you provide an example (or a test) of what's the expected format after dictization? (It is hard to follow up and I honestly do not remember what's in those fields.)
  • Given that the current architecture is to provide dictize methods for objects, could you create a more explicit harvest_error_dictize(obj, context) instead of an internal generic method? It would be more readable and maintainable that way.

@seitenbau-govdata
Copy link
Member Author

Hi @pdelboca. Thank you for the feedback.

We created a testcase for the API-method harvest_source_show_status. If the dictization of the response works as intended this test will succeed, otherwise it will break at line 778. We use json.dumps to check this like in the CKAN code where the reindex error is currently raised: https://github.com/ckan/ckan/blob/8aaf1c65b1a83d53296b71e16cfff4464694aad1/ckan/lib/search/index.py#L130.
What causes the json.dumps to fail is that the elements of the lists last_job['gather_error_summary'] and last_job['object_error_summary'] are now of the type sqlalchemy.engine.row.Row.
For some reason this type cannot be serialized by json.dumps. That`s why our suggested solution contains a separate dictization of the single elements of those lists first.

@seitenbau-govdata
Copy link
Member Author

@pdelboca or @amercader Could you take a quick look at it? I think the problem is quite annoying for all users with CKAN 2.10. Thanks in advance. 😃

@amercader amercader merged commit 4710035 into ckan:master May 23, 2023
3 checks passed
@amercader
Copy link
Member

Thanks @seitenbau-govdata , I've released a new version with this fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rebuilding search-index broken, when harvest errors exist.
3 participants