Flagging occurrence records where community feedback has identified wrong or doubtful identifications #4187

CecSve · 2022-07-27T10:01:56Z

There appears to be a genuine need for data users to provide feedback on occurrence records with wrongly or doubtful identifications in GBIF.org, mainly coming from citizen science data sources where none to some data quality checks happen at the source. The need seems to be especially coming from the expert community of hard-to-identify taxa, such as insects.

Ideas for potential implementation

integrate feedback option in the portal system where users can comment on wrongly identified occurrence records directly to GBIF (this would require a documented pipeline for GBIF on how to handle such reportings, e.g. categorizing issues into standardised flags or issues).
images from occurrence records where the users have reported the identification to be wrong or doubtful should have a disclaimer ribbon attached stating something along the lines of: 'identification issues', 'wrong identification' and potentially have a more elaborated text description associated with the record based on the comments from the user. Such a disclaimer could be linked to a specific flag or issue.
any flags or tags that appear on the portal should be included in the (DwC-A) download - perhaps under the flags and issues column
auto-assign such ribbons/tags etc. based on whether mutltiple identifications were made at the data source - not sure how programmable it is, but it could be one tag ('multiple identifications exists at data source') or something.

Potential issues for implementation

if GBIF includes curation by tagging and flagging occurrences, how should those modifications be dealt with when the datasets are re-indexed? It should be possible to remove flag and tags if the occurrence is updated and the issue no longer persist, but it should also not automatically be removed upon indexing.
auto-assigning flags and issue from citizen science portals may be quite challenging, as some portals have sections that highlights the various identifications, e.g. iNaturalist, while others only have the information in the comments section, e.g. Naturglucker.

@MortenHofft do you have any thoughts on these ideas and do you know other informatics people who would like to provide feedback? @jhnwllr and @ahahn-gbif are working on something similar to this, I am aware. Please provide any feedback or add to this if there is something I have missed.

CecSve · 2022-07-27T10:06:48Z

May relate to this issue: gbif/registry#247

MortenHofft · 2022-08-15T10:43:45Z

This is a quick brain dump. It is a recurring theme. Normally we use the term annotations. Multiple ideas have been floated over time.

Annosys have been used for some publishers (we saw close to no annotations - perhaps 3 over 1 year. I was one of them.)
We tried briefly labelling occurrences that had a GitHub issue attached to them (but was blocked by Github as we did too many requests)
We have discussed rule based annotations. E.g. where you can create a filter, and then draw a polygon and say: all these have wrong classification. It doesn't live there. And then that rule applies going forward.
We have discussed a dedicated website. e.g. fix.gbif.org that allowed community annotations which could then be used to enrich gbif.org and allow for filtering. Essentially an: according the community this value should be X. Just like we do with GADM and our existing machine interpretations.
We have discussed allowing publishers to opt in to being interested in annotations.

Some of the challenges are:

how to reconcile views. If the publisher looks through the feedback and disagrees, then what happens. How do we know that it has been addressed. And who gets to decide
How to make it motivating. Feedback shouldn't just be lost/left to die. If no one listens and update the record then what?
Time/priority
Do we want to be the ones building that system
Is enough people actually interested in using it - is there enough of an audience?
Others in our community have implemented this in the past. We should probably try to understand their attempts. Or can/should we use them somehow?

I guess no one provided a clear thought through model of how this should work. What is the flow of an annotation.

CecSve added data content public relevance occurrence Fix would be in the occurrence project Secretariat idea labels Jul 27, 2022

jhnwllr mentioned this issue Aug 15, 2022

GBIF Rule-Based Annotations gbif/data-products#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flagging occurrence records where community feedback has identified wrong or doubtful identifications #4187

Flagging occurrence records where community feedback has identified wrong or doubtful identifications #4187

CecSve commented Jul 27, 2022 •

edited

CecSve commented Jul 27, 2022

MortenHofft commented Aug 15, 2022 •

edited

Flagging occurrence records where community feedback has identified wrong or doubtful identifications #4187

Flagging occurrence records where community feedback has identified wrong or doubtful identifications #4187

Comments

CecSve commented Jul 27, 2022 • edited

CecSve commented Jul 27, 2022

MortenHofft commented Aug 15, 2022 • edited

CecSve commented Jul 27, 2022 •

edited

MortenHofft commented Aug 15, 2022 •

edited