Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver: allow to force update specific claims #386

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jum-s
Copy link
Contributor

@jum-s jum-s commented Apr 27, 2020

Replace entity claims with entry claims of properties declared in the forceUpdateProps parameter (renaming welcome)

Usecase: an editor catalogue, representing the most canonical data is passed through the resolver. But some formating of this catalogue may be less precise than existing inv entities claims.

forceUpdateProps allow to update only the publication date for example, date format being less subjected to interpretation than a title.

  • early parameter validation: against properties values constraints list
  • perform an update, aka only if claim already exists (allowing to not check entity type before update)
  • rely on not updating wikidata items policy already implemented in
    update_resolved_entry

replace entity claims with entry claims of properties declared in the forceUpdateProps parameter (renaming welcome)
usecase: an editor catalogue, representing the most canonical data,
is passed through the resolver.
But some formating of this catalogue may be less precise than inv
entities claims.
forceUpdateProps allow to update only the publication date for example,
date format being less subjected to interpretation than a title

- early parameter validation: against properties values constraints list
- perform an update, aka only if claim already exists (allowing to not check entity type before update)
- rely on not updating wikidata items policy already implemented in
update_resolved_entry

return getEntity(prefix, entityId)
.then(addMissingClaims(seedClaims, userId, batchId))
getEntity(prefix, entityId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
getEntity(prefix, entityId)
return getEntity(prefix, entityId)

I would be in favour of returning those promises (this one and the following), so that we wait for it to resolve before moving to the next operations, as that makes everything simpler: you can build on the assumption that a certain thing was done once the function returned value was resolved, while if you don't return the promise, you don't wait for it, thus uncertainty unfolds. It also makes error handling easier: when something somewhere in the resolve fails, we can stop the resolve for that entity, and handle the error at the appropriate level.

Worse, since NodeJS v15, if any error happens in those non-returned promises, it will be considered an unhandled promise and will make the process exit, so the only viable alternative to not returning promises would be to set a .catch(_.Error('foo') on all of them

Comment on lines +37 to +38
const updateClaims = (seedClaims, userId, batchId, forceUpdateProps) => entity => {
addMissinClaims(seedClaims, userId, batchId, entity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const updateClaims = (seedClaims, userId, batchId, forceUpdateProps) => entity => {
addMissinClaims(seedClaims, userId, batchId, entity)
const updateClaims = (seedClaims, userId, batchId, forceUpdateProps) => async entity => {
await addMissingClaims(seedClaims, userId, batchId, entity)

same as above, wait for promises

const newClaims = _.omit(seedClaims, Object.keys(entity.claims))
if (_.isEmpty(newClaims)) return
return entities_.addClaims(userId, newClaims, entity, batchId)
entities_.addClaims(userId, newClaims, entity, batchId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same: return promises


const propertiesToForce = [ 'wdt:P123' ]
await resolveAndForceUpdate(editionEntry, propertiesToForce)
.then(Wait(100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.then(Wait(100))
await wait(100)

}
})
if (!_.isEqual(claims, currentDoc.claims)) {
entities_.putUpdate({ userId, currentDoc, updatedDoc })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by calling entities_.putUpdate directly, it skips validateClaims (validateAndFormatClaims on current master branch): that's a problem, as mistakes happens, and this validation can be precious to avoid a wide range of errors (ex: setting an entity of type work in place of a publisher or setting an edition ISBN to an ISBN already used by another edition)

@maxlath
Copy link
Member

maxlath commented Feb 25, 2021

Adding only the missing values was the low hanging fruit, trying to update existing data en mass is much more challenging: I would be in favour of postponing this PR to when we can refactor it with the fruits of #400, particularly to benefit from the formatting and validation. We might also benefit from some cleanup steps: to take your example, if a wd entity replaces an inv entity as publisher, there could be a check on the inv entity to see what action should be taken. Maybe creating a task? That could allow the person working on that task to import any existing data from that inv entity missing on the wd entity

@jum-s
Copy link
Contributor Author

jum-s commented Feb 25, 2021

yep, replacing uris looks challenging indeed. But some "hardcoded" values (as non uris values) could be automatically updated, this is exactly what #507 is trying to do.
It can be discussed in a new issue, but if "a wd entity replaces an inv entity as publisher" what could be done is to update the claim but also to try to "garbage collect" "orphan" publishers by redirecting them to the wd entity.
I dont think there is any reason to refuse en mass claim updates that can be updated individually, as long as tools to reverse it exist (batch_id).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants