Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix restricted dois problem #1724

Merged

Conversation

utnapischtim
Copy link
Contributor

@utnapischtim utnapischtim commented Apr 13, 2024

  • test: update test workflows
  • fix: don't mint doi's on restricted

this PR will close inveniosoftware/product-rdm#178

still open Problem:

  • the DOI is shown on the landing page, because the PID exists in the database
  • the DOI widget in the frontend has to be grayed out if the access is set to restricted
  • is it really necessary to gray it out? it complicates the code. it has to be done independent of DOI but for all other pids too, to be generic in that way. The problem is, that the DOI could be already there (entry in pidstore_pid), in the InvenioRDM instance, because the record is set to restricted in the end of the drafting process. there would be a lot of work to do to make that consistent.

ATTENTION:

  • to make it clear for the user, a message should be added to the landing page too. this has to be done on invenio-app-rdm

NOTE:

  • with this solution it is easy to change from restricted to public and mint the DOI with this step. No further implementation or work from the user would be necessary

UI:
after grace period of a public record ended, the restriction button is disabled + the popup message
Screenshot 2024-05-21 at 09 54 57

@utnapischtim utnapischtim force-pushed the fix-restricted-dois-problem branch 4 times, most recently from ed51f3a to ab061d8 Compare April 16, 2024 21:57
Copy link
Contributor

@tmorrell tmorrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! This safely skips registering DOIs with DataCite on restricted records, and hides DOI metadata when users have changed the permissions. It also allows for DOIs to be registered when a user changes a restricted record to public or when an embargo expires (though maybe that should be tested just to be sure).

# Set metadata
doc = self.serializer.dump_obj(record)
self.client.api.update_doi(metadata=doc, doi=pid.pid_value, url=url)
if hide:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be able to update a doi even if it is is hidden? Also, how do I restore a hidden doi? Currently, if I change from restricted to public then the update method will update the metadata but not make the doi findable again, unless I miss something...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch. it should be that the hided should be put back to "findable" after a change from "restricted" to "public". i will have a look.

Copy link
Contributor

@tmorrell tmorrell Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's going to be a tricky one, since I don't think we want to run show_doi for every single public DOI update.

I can think of two solutions:

  • I can add a state option to the update_doi function in the datacite library. Then I could pass along all the updates to DataCite in one call
  • We can modify pidstore so it has a hidden state. Then we could check locally whether the state of the DOI needs to be changed.

Let me know what you think or if you want to chat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzacharo @tmorrell what do you think of the proposed solution to this problem? this does not make it necessary to keep track the state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think that should work too. Just make sure DataCite doesn't complain about having the event as "publish" for a published doi. Looks like the tests need to updated too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DataCite indeed doesn't complain about the "publish" events that should be good. Otherwise, we need some kind of state, maybe the caller passes the old record value, to call the publish/show doi only on the change of permission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How i tested it: DataCite doesn't complain.

@utnapischtim utnapischtim force-pushed the fix-restricted-dois-problem branch 2 times, most recently from b07ef89 to a4402bc Compare April 17, 2024 22:19
if hide:
self.client.api.hide_doi(doi=pid.pid_value)
else:
doc = self.serializer.dump_obj(record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reviewed this with @zzacharo and we think that when you change from restricted to public, here we are not calling .register(), so the local PID will still have the status PIDStatus.NEW or PIDStatus.RESERVED.
We are missing a test that will test the change from restricted to public and viceversa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pid register method is already called, on the publish. that is independent of the way datacite is handling the restriction access setting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this call. That will not be called if the record is restricted. Isn't this suppose to register the "doi" PID?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that call won't happen until the record is public. The "doi" pid doesn't need to be registered until the DOI is actually sent to DataCite. So every time there are changes to the record it will try to register the DOI, but it will only get registered if the record is public. A test is probably a good idea though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this call. That will not be called if the record is restricted. Isn't this suppose to register the "doi" PID?

what do you mean by "register" the doi PID?

as Tom said, the doi should only be set to R in pidstore_pid if it has been registered at datacite. and that isn't the case if the record is set to restricted. this means that the call in the register_or_update task will register the pid DOI at datacite after the change from restricted to public

We reviewed this with @zzacharo and we think that when you change from restricted to public, here we are not calling .register(), so the local PID will still have the status PIDStatus.NEW or PIDStatus.RESERVED. We are missing a test that will test the change from restricted to public and viceversa.

is this test what you had in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean using the DatacitProvider directly without going through the service :) It is not the usual/recommended way but as an admin you might do that and I would rather have an extra check to register the local PID doi before going to datacite. The normal flow via services is working correctly no problem with that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then yes you are right, we have to add a if to register the pid if it has not been registered at that point. which makes the whole code really messy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pid.register() method already checks if a pid is registered otherwise do nothing.... Maybe I am overthinking it but I was just triggered by the fact that such a check was present in the register method..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest, in my point of view it is wrong to register the DOI pid in pidstore_pid within the provider class. the provider should only take care what's associated with the provider. so the DataCitePIDProvider.register should not know about the internal behavior and should only be called if it has been registered internally. But that is another discussion.

yes pid.register takes care of knowing if it has already bin registered but this is not called in DataCitePIDProvider.update as you pointed out. So if a record is set to RESTRICTED the DOI PID will not be set to R in pidstore_pid. If someone is using the DataCitePIDProvider.update method then directly the DOI will be registered at DataCite and will remain K in pidstore_pid. if the record is then set to public the code will mostly work, because the public_doi method of the datacite rest client does more or less the same as the update does, it uses the event: publish to register the doi.

the thing is, is it a problem if the doi is in state K in pidstore_pid for this scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the /doi link won't work if the identifier is in state K. I guess we could add a check to pidstore and then register if needed, though it's a lot of extra calls to take care of someone who's not really following the normal flow.

self.client.api.update_doi(metadata=doc, doi=pid.pid_value, url=url)
if hide:
self.client.api.hide_doi(doi=pid.pid_value)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way could be to delegate the state to DataCite: we could first read/get from DataCite, check if it is hidden or not, and therefore add the event: publish or not. We are in a celery task so it could work.
However, we need to take into account that DataCite might not respond, and what to do in that case.
At the same time, I think that it would make sense to have an update method that receive the old and the new version of the record, so the implementation could use the diff to make choices. It feels like a much bigger change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which concerns do you have with having always the event: publish there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion it would complicate the whole process if it has first to be checked on DataCite if the record is hidden. The state management will be more complex and we need at least one request to get the information. Yes one request is not that much but scaled on multiple InvenioRDM instances with multiple records DataCite has to handle a lot more requests.

@anikachurilova anikachurilova marked this pull request as ready for review May 14, 2024 14:03
@anikachurilova anikachurilova force-pushed the fix-restricted-dois-problem branch 2 times, most recently from 768b222 to 2175536 Compare May 15, 2024 08:32
@jrcastro2 jrcastro2 force-pushed the fix-restricted-dois-problem branch 4 times, most recently from acc0383 to b3aeaf0 Compare May 22, 2024 07:04
@jrcastro2 jrcastro2 force-pushed the fix-restricted-dois-problem branch 4 times, most recently from 92e3ccf to eb8bcef Compare May 22, 2024 08:59
@@ -21,7 +21,7 @@
class PIDManager:
"""RDM PIDs Manager."""

def __init__(self, providers, required_schemes=None):
def __init__(self, providers, required_schemes=[]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why we changed this to mutable argument? it is considered not to be a best practice

@jrcastro2 jrcastro2 force-pushed the fix-restricted-dois-problem branch from eb8bcef to 2e0a53a Compare May 22, 2024 10:06
utnapischtim and others added 2 commits May 22, 2024 13:07
* fix bug for not existing docker-compose, use docker version instead
* reorder steps to first setup and run backend and then setup and run
  frontend

Co-authored-by: jrcastro2 <jrcastro9515@gmail.com>
Co-authored-by: anikachurilova <nyuta.churilova@gmail.com>
@jrcastro2 jrcastro2 force-pushed the fix-restricted-dois-problem branch from 2e0a53a to f213917 Compare May 22, 2024 11:08
@jrcastro2 jrcastro2 merged commit 80fe33f into inveniosoftware:master May 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling DOIs for restricted records
7 participants