Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DiSSCo Network Entity #282

Open
timrobertson100 opened this issue Jan 18, 2021 · 17 comments
Open

Create DiSSCo Network Entity #282

timrobertson100 opened this issue Jan 18, 2021 · 17 comments
Assignees

Comments

@timrobertson100
Copy link
Member

DiSSCo would like a network entity containing the datasets originating from the relevant institutions.
Wouter A has prepared a spreadsheet with the GBIF keys.

  • Wouter asks to view it in UAT beforehand. We should create it, but since UAT is not sized sufficiently to crawl all data, I am not sure of the benefits. Creating a repeatable SQL script to use on UAT and prod seems sensible
  • I propose that we also add the ROR and GRID IDs as additional identifiers to the relevant entries in a separate SQL script
@ManonGros ManonGros self-assigned this Jan 18, 2021
@wouteraddink
Copy link

How would you add the ROR and GRID IDs, as "tags" or as DwC field (institutionID), what about the (often different) institution name in the EML profile and what would be the process regarding registering as part of the network and registering these IDs, for new datasets added by DiSSCo partners or new partners becoming a GBIF dataprovider?

@timrobertson100
Copy link
Member Author

How would you add the ROR and GRID IDs, as "tags" or as DwC field (institutionID)

Tags would be an option, but I'd propose to just an identifier to the entities where is makes sense. We support multiple identifiers on all instances in the registry. This has no affect on occurrence records, but simply allows to find the organisation in the registry using the ID.

what about the (often different) institution name in the EML profile

It wouldn't be affected in GBIF. All it is doing is saying "this entry in the registry is also known as a different ID" and won't change the name that the organisation was registered in GBIF as. The name can be changed at any time though if desirable.

and what would be the process regarding registering as part of the network and registering these IDs, for new datasets added by DiSSCo partners or new partners becoming a GBIF dataprovider?

Registering datasets and institutions in GBIF will work as they always have done. Authorization to curate the membership for the network entries (i.e. adding or removing GBIF datasets to the DiSSCo entry) can be given to one or more accounts as desirable. In time we'll probably want to automate membership somehow.

@ManonGros
Copy link
Contributor

Concerning the Network:

For testing, I created a network in UAT: https://registry.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519
I wrote a script using the API for that so it can be reproduced in prod.

The constituents of the network are all the datasets that are published by the GBIF organisations listed in the spreadsheet that are DiSSCo members (disscoMember == "y").
NB: In UAT, this includes all kinds of test datasets (but not all the datasets available in prod).

@rukayaj
Copy link

rukayaj commented Jan 20, 2021

Not meaning to hijack this thread, but doesn't it make more sense to link ROR and GRID ids to GRSciColl institutions rather than to GBIF organizations?

@timrobertson100
Copy link
Member Author

Not meaning to hijack this thread, but doesn't it make more sense to link ROR and GRID ids to GRSciColl institutions rather than to GBIF organizations?

Thanks @rukayaj . Yes, both make sense though, as GRSciColl will only ever contain a subset of the publishing organisations in GBIF

@wouteraddink
Copy link

Since GrSciColl institutions and GBIF organisations are completely separate at the moment, as far as I know, you would ideally do it in both.

@rukayaj
Copy link

rukayaj commented Jan 20, 2021

Ok, I had forgotten that GRSciColl was for institutes with physical collections... So I think that you're saying some research institutions do not fit into GRSciColl (as they do not hold physical collections), but these institutions would have ROR and GRID ids? That makes sense then, and in that case I think it'd be better to just have GRIDs/RORs in one place.

@wouteraddink They're kind of being linked in the portal UI with the fuzzy matching e.g. https://www.gbif.org/occurrence/2579432371?

@ManonGros
Copy link
Contributor

GRID and ROR discussion related to this other issue: #274

@dagendresen
Copy link

dagendresen commented Jan 20, 2021

I´d love to see ROR/GRID/ISNI used per occurrence record with dwc:institutionID (to override institution IDs in the EML -- because could apparently be distinct even within the same DarwinCore-Archive).

(the occurrence record is about the occurrence; while the GRSciColl record is about the institution -- the institutionID property on the occurrence record would link/bridge the two)

@wouteraddink
Copy link

I think in principle you could use a ROR/GRID/ISNI in dwc:institutionID without problems but it is against current recommendation in the DwC documentation. I think as a community we need to change this recommendation.

@wouteraddink
Copy link

Thanks Marie, I see the network now in UAT, however, it would be nice to have it filtered by default for specimen datasets only. Also, https://www.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519 is still empty?

@timrobertson100
Copy link
Member Author

Also, https://www.gbif-uat.org/network/9400230d-de38-4e0e-b44d-fcdb661f0519 is still empty?

All datasets need to be reprocessed to pick up the networkKey in the index

@ManonGros
Copy link
Contributor

ManonGros commented Jan 20, 2021

+ the summary page has to be edited in another system (we can do that in production).

Should I include the datasets that have some preserved specimens or only preserved specimens?

@wouteraddink
Copy link

wouteraddink commented Jan 20, 2021

I would include also datasets that have some preserved specimens. Not sure how that would influence counts on the overview page, are these record or dataset based?

@ManonGros
Copy link
Contributor

The metrics are generated based on the records of the datasets belonging to the network. This means that if I tag a dataset containing observations, these observations will be included in the metrics.

@dagendresen
Copy link

dagendresen commented May 7, 2021

@wouteraddink at GBIF Norway we have now moved all the university museum GBIF data publishers (not eligible for ROR and Grid) to the university level (with ROR and Grid) and merged (moved respective datasets) with the eventual GBIF data publishers that have been created for university departments for biology and geology.

We aim to follow the principle that Norwegian GBIF data publishers should be entities that qualify/are eligible for a ROR and Grid ID. (And briefly started to suggest for data publishers eligible but not yet with a ROR to register for this ID).

I have updated your "CETAF+DiSSCo institutions" spreadsheet using "comments" (where row 121-122 would be merged).

@wouteraddink
Copy link

Thanks for the info @dagendresen. I have been talking with both GRID and ROR, GRID is tighening their policies no longer allowing separated identifiers for institutions embodied in universities. ROR is still 1:1 synchonised with GRID but that may change later this year and they will likely have a more relaxed policy, also a ROR WG is working on an extension for departments, but that is in early stages of development and it is not decided yet whether these will be minted through ROR directly or through wikidata or github. For DiSSCo we can now work with ROR as it has now a fully implemented metadata schema including parent organisation relations and if institutions cannot get a ROR we can use cetaf passport identifiers and link them to their University ROR if needed. Orcid has not yet implemented ROR but is planning this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants