Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong unique logic for institution identifiers #335

Open
ctueck opened this issue Aug 14, 2020 · 3 comments
Open

Wrong unique logic for institution identifiers #335

ctueck opened this issue Aug 14, 2020 · 3 comments

Comments

@ctueck
Copy link
Member

ctueck commented Aug 14, 2020

There's a wrong logic for institution identifiers' uniqueness at the moment, see in

unique_together = ('institution', 'agency', 'resource')
and the resulting DB constraint:

"deqar_institution_identi_institution_id_agency_id_dfd7699f_uniq" UNIQUE CONSTRAINT, btree (institution_id, agency_id, resource)

This currently ensures that for one institution there is only one identifier from a specific agency or resource. But that's not very meaningful, and there wouldn't be a problem if one agency/resource has several identifiers for one institution.

What should be unique is the identifier, and that's currently not ensured. I.e. it's possible to add the same identifier for two different institutions. (I think it leads to a 500 error if you try to use such an identifier later on.)

So I guess the "unique together" should actually refer to the combination identifier - agency - resource instead.

@ctueck
Copy link
Member Author

ctueck commented Aug 14, 2020

PS, there's a little side problem:

I noticed when I was surprised that I could easily add several non-local identifiers with identical resourcefor one institution. Reason seems to be that PostgreSQL manual notes that "Null values are not considered equal" in unique indices.

Not sure if there's an easy fix, but this also is the smaller problem. Identifiers managed by agencies would still be ensured to be unique, just we need to be cautious when managing national/other identifiers.

@JoshBone
Copy link
Contributor

JoshBone commented Nov 2, 2020

Mmm...
Let me think this twice before we connect.
So the unique_together ensures that the combination of agency, institution AND resource (which is either 'local_identifier' or 'national_identifier') is unique...

Which means that agency A can submit to institution B one 'local_identifier', one 'national_identifier', etc.
But you are right, it's not checking the identifier uniqueness...

Probably a better solution would be to leave this check, but also ad unique_together with the combination of ('agency', 'resource', 'identifier') which together with the existing one, makes the desired effect.

What do you think?

@ctueck
Copy link
Member Author

ctueck commented Nov 8, 2020

I guess the additional unique_together is clear, since identifiers absolutely need to be unique (for one agency/resource).

The current one doesn't really create a problem, since it doesn't normally happen that one agency or one identifier system has two (or more) identifiers for one and the same institution. But if that ever happened, it would not really be a problem to allow it, would it? Because we always resolve identifier -> institution, and never need a unique mapping vice-versa. (Or do we?)

But I realise there could be an issue remaining: I think the unique constraint would never "complain" about a global identifier (because agency is NULL then), would it? And it's not feasible to add another one, as resource & identifier together are not necessarily unique (because they might be from different agencies...).

Yet, it would still be an improvement if at least for agency local identifiers the constraint ensures uniqueness. Since the other ones (global identifiers) are managed by us there is some additional control in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants