Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do biologging repositories ingest DarwinCore aligned data? #3

Open
albenson-usgs opened this issue Aug 25, 2018 · 12 comments
Open

Comments

@albenson-usgs
Copy link
Contributor

How will Movebank, OTN, iOBIS, LivingAtlases, etc ingest the data that we are recommending use these guidelines?

@jdpye
Copy link
Member

jdpye commented Aug 26, 2018

The wish we expressed at the pre meeting:

Movebank accepting a DwC archive in this format for ingestion as-is.

OTN intends to make analysis toolboxes like glatos capable of ingesting these archives as data input, and to produce them for acceptance by iOBIS via the (soon-to-come) OTN IPT.

@sarahcd
Copy link

sarahcd commented Sep 10, 2018

This is a good question. From Movebank's end, I'll be working on this more this fall, some initial impressions:

  • I see the initial demand to get data FROM Movebank's format TO DwC, rather than the other way around. Our first aim is to make Movebank's publicly archived datasets (datarepository.movebank.org) discoverable in GBIF. Currently our users have and want to work with tabular csv files that combine measurements per timestamp in one row, and few are familiar with DwC.

  • Automated data import to Movebank typically relies on tabular text data files that always include the same attributes/format/units in the same order (https://www.movebank.org/node/10). As I understand the OBIS-ENV format is considerably more flexible, and I'm not sure how we would read values and map/convert them to Movebank attributes in any automated way. More realistically, an R script could be used to convert an OBIS-ENV DwCA to a tabular file that the user could import into Movebank as a custom CSV.

  • I'll know more about this once I see more examples of the OBIS-ENV format, but I have a concern that the data volume in this format is going to be extremely high for typical Movebank datasets, which might pose a difficulty for web-based upload and import.

I'd be happy to have a discussion about this!

@jdpye
Copy link
Member

jdpye commented Sep 10, 2018

I think that the occurrence-core subset of the OBIS-ENV should be able to vocab-map directly to a nice tabular dataset that MoveBank will be happy with. This was the brainchild of our April workshop on the subject, where Peter expressed the desire that the occurrence data be able to stand on its own as a minimal description of animal presence.

Data volume for satellite data will be high, but if we sidebar the non-occurrence datasets the volume should be manageable. It's when we drag in the oceanography, the accelerometry, all the in situ measurements, that we start to spiral out of control volume-wise. And happily, under this format, those go off into EMoF and stay clear of the Occurrence data.

Let me know if there's a chat imminent, I'd love to participate!

@albenson-usgs
Copy link
Contributor Author

@sarahcd This question actually came up because @peterdesmet would like his seabird tracking data to go into Movebank and is wondering if Movebank will be able to (relatively) seamlessly pull in data that's in Darwin Core but it sounds like based on your second bullet above this may not be possible. Or at least would take some work to figure out. Ideally if we can get Movebank, GBIF, and OBIS all speaking the same language (ie Darwin Core) then all pieces of this type of data become more easily interoperable (I hope!). I wonder if a good next step on this would be for me to work with you Sarah on getting a Movebank dataset into OBIS-ENV-DATA just so you have a clearer idea of what that looks like.

@sarahcd
Copy link

sarahcd commented Sep 17, 2018

I say we plan a meeting where we can screenshare and look at this, after I have some time to look through the feedback on my "draft" OBIS-ENV format dataset that I've already received from the very helpful @albenson-usgs :). I'll send an email to schedule, if anyone else sees this and would like to join let me know.

@Antonarctica
Copy link

@sarahcd Happy to join the meeting.

@sarahcd
Copy link

sarahcd commented Sep 19, 2018

@Antonarctica can you tell who you are? ;) I can email you with the specs.

@albenson-usgs
Copy link
Contributor Author

After the discussion today, we decided for the time being a use case does not currently exist where a an individual data provider would want to align their biologging data to Darwin Core and have it harvested by OBIS/GBIF/Movebank/OTN/Zoatrack. Instead individual data provider will work with a biologging data aggregator (Movebank/OTN/Zoatrack) and then that aggregator would be the one to align the data to Darwin Core and share with OBIS/GBIF.

@sarahcd
Copy link

sarahcd commented Oct 24, 2018

Maybe a more general, related issue to add is how to get multiple DwC archives into a data frame for analysis. This is the end goal for many users and gets at the db-ingestion question but with a more broadly relevant use case.

@peggynewman
Copy link

Sounds like an R package to me once we have this nailed!

@albenson-usgs
Copy link
Contributor Author

@sarahcd If OBIS does indeed integrate biologging data into the system then you should be able to do this using the OBIS API or the robis package. Do I have that right @pieterprovoost?

@pieterprovoost
Copy link

pieterprovoost commented Nov 20, 2018

@albenson-usgs @sarahcd @peggynewman The robis package will indeed provide access to integrated datasets, but if you want to combine multiple archives without going through the OBIS system you can use https://github.com/ropensci/finch (for reading archives) and https://github.com/iobis/obistools (for merging event trees and occurrences).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants