Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use parentEventID if not necessary #8

Open
peterdesmet opened this issue Oct 24, 2018 · 8 comments
Open

Do not use parentEventID if not necessary #8

peterdesmet opened this issue Oct 24, 2018 · 8 comments

Comments

@peterdesmet
Copy link
Member

In "Mahoney-data-DwC-A-test-2" I noticed that parentEventID is populated with the animalID:

F53
IdCoy_P3_1

I would not do this:

  1. Records with that ID cannot be found in the event core (I don't think we should create them either)
  2. Events are already clearly grouped in how their eventID is written, e.g. F53:capture1
  3. I would really avoid using a hierarchical structure of events if we can help it
@pieterprovoost
Copy link

I think there needs to be a way to group events per animal which does not involve parsing character delimited strings. Also, I think encoding hierarchy in strings makes it harder to check referential integrity. eventID F35:capture1 (did you notice the typo?) will not trigger any alarms, but the fact that parentEventID F35 refers to a non existing eventID exist might.

Of course any parentEventID needs to have a matching eventID.

@peggynewman
Copy link

@peterdesmet I'm interested to understand why you'd like to avoid hierarchical events? They seem to offer a lot of flexibility. I get that they might be difficult to interpret/ingest from system to system.

@sarahcd
Copy link

sarahcd commented Sep 11, 2019

I deleted parentEventID from my event table. So now, the only place where unique animals are identified in organismName in the occurrence table. But I agree with @pieterprovoost's point: I think something is missing here and the event table (the closest to a 'summary' table) needs to define unique individuals somewhere. Otherwise it is more difficult to check the data or compile it into a data frame, and there is a good chance of confusion, e.g. that deployments get confused with individuals, or the user doesn't notice that multiple records are about the same individual. I have a similar concern with FOM records that don't include a unique animal identifier and have no associated occurrenceID (measurements not taken at the same time as a GPS fix). However I don't see any other good place to define individuals in the event table. Happy for any ideas!

@peggynewman
Copy link

Using string parsing to define relations gives me ER nightmares. But complicated hierarchies won't be universally handled well. I think we should use parentEventID to tie together all of the events and occurrence records, but make a recommendation to use it for a simple parent-child relationship with no further levels.
The only other way to tie @sarahcd 's acceleration-x MoF back to the organism is to pick an occurrence against the same deployment event. Then the selection is tricky - random? max? min?

@sarahcd
Copy link

sarahcd commented Sep 12, 2019

IMO matching measurements to occurrences is not a good solution. (1) There are bio-logging datasets with no occurrences at all except the capture events (e.g. datasets of light level, conductivity and temperature). (2) I really doubt there is one good method for doing this, it will depend on sensor sampling schedules, species/habitat and analysis question. We are unnecessarily processing the data in ways that are not necessarily biologically meaningful and might confuse interpretation.

For now I'll add parentEventID back in.

@albenson-usgs
Copy link
Contributor

I'm wondering if looking at this from different user points of view might help? Can we think of some different users and work back from there / make sure the data will be presented to them in a way that's most useful?

User 1: General GBIF/OBIS user. Just wants to know where individuals of a species occur in space. Most important that they understand all of these occurrences are the same individual.
User 2: Interested in assessing animal movements. Needs to be able to parse deployments from individuals. What else?

Are there other users we can think of? Who will be using the acceleration data?

@pieterprovoost
Copy link

Would this be a suitable structure? I assume that if anyone needs the acceleration data they can do the matching themselves? I still think a single parent event per organism makes it easier to select all data for a single organism (given that the database at hand is set up properly).

biologging

@peterdesmet
Copy link
Member Author

What @pieterprovoost suggests is probably the best way, but it feels like trying to fit a square peg in a round hole. "Organism" is a concept in Darwin Core, it's just not a "core" file now. Using the Event Core concept has the advantage that GBIF/OBIS can currently handle that data, but it might be good to do the exercise in how we would express biologging data - reusing Darwin Core terms - if we had more freedom in how to structure it.

/cc @timrobertson100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants