Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAATD dataset for use case #19

Open
data-biodiversity-aq opened this issue Dec 12, 2019 · 29 comments
Open

RAATD dataset for use case #19

data-biodiversity-aq opened this issue Dec 12, 2019 · 29 comments

Comments

@data-biodiversity-aq
Copy link

Hello,

@Antonarctica has a dataset from Retrospective Analysis of Antarctic Tracking Data Project which could be interesting for the use cases. The project uses a mix of sensors:

Global Location Sensors (GLS loggers or geolocators), satellite-relayed Platform Terminal Transmitters (PTTs), and Global Positioning System devices (GPS)

Thanks a lot!

@jdpye
Copy link
Member

jdpye commented Dec 13, 2019

the RAATD data is semi-famous in my circles now, I know how much work has gone into straightening that data out and I would love to implement that example in this format!

@jdpye
Copy link
Member

jdpye commented Dec 13, 2019

it's possible that there's some overlap with this data example I've solicited from @ianjonsen as well: https://github.com/ianjonsen/tdwg_imos/wiki/Argos-satellite-tracking-of-southern-elephant-seals but I know the full RAATD dataset will have more coincident deployments on it and I think that's a very useful example.

@ianjonsen
Copy link

It might be best to go with the RAATD data, depending on how much @Antonarctica has (probably all of it Anton?). These would be a more comprehensive test case, with multiple species, deployment locations, tag types, etc... The data aren't in the public domain yet as the Scientific Data paper hasn't come out, but I expect this will happen quite soon (possibly next 1-2 months)

@Antonarctica
Copy link

Hi
I just got the last version back from the AADC So it is all on my laptop. So we should be able to standardise it to the Darwincore event format. And this would indeed be the right time to do it.

@peggynewman
Copy link

What's the best way to go about working on that? Like Movebank has, with a few examples on a spreadsheet in Github with the larger dataset sitting elsewhere?

@Antonarctica
Copy link

The original compiled datasets will reside at the Australian Antarctic Data Centre (this was diced a while ago, if making the decision now we might have gone with Zenodo).

The plan is to have a full copy published through the biodiveristy.aq IPT In Darwincore Event core (feeding into OBIS/GBIF). It would be good if that could also be linked to move bank but I'm not that knowledgeable on Movesbank and how the flow would go best (also some of the data might already be in).

Of course this is a good example to follow I guess
https://zenodo.org/record/3541812#.XgZmXi2ZM0o although our dataflow would be different.

We have standardised data and filtered data

We have metadata on the deployment (see below)

for standardised

RAATD_ADPE_standardized.csv

for filtered eg.

ADPE_ssm_by_id.rds
ADPE_ssm_by_id.pdf
ADPE_ssm_by_id_predicted.csv

Metadata variables
dataset_identifier file_name individual_id keepornot scientific_name common_name abbreviated_name sex age_class device_id device_type deployment_site deployment_year deployment_month deployment_day deployment_time deployment_decimal_longitude deployment_decimal_latitude data_contact contact_email comments

@msweetlove
Copy link
Contributor

I made a first attempt to format the RAATD data in the DarwinCore format, with an event core and occurrence extension. Before we want to push the whole dataset into this format, it might be useful for you guys to have a look at it and give some feedback on the approach taken. The most important formatting discussion are written down in the README file

The data, R-script to format it and README can be found here:
https://github.com/msweetlove/dwc-for-biologging/tree/master/use-cases/RAATD-penguin-tracking-use-case-for-discussion

If possible, can someone with admin rights merge my fork of this repo with tdwg/dwc-for-biologging?

@ianjonsen
Copy link

This looks pretty good, only issue I've noticed so far is that the variable "fieldnotes" contains the Argos location quality indices. These indices are essential for Argos location quality control and other movement modelling processes and should have a more informative variable name. Will the schema allow "location quality" to be used instead of "fieldnotes"? I would worry that anything named "fieldnotes" would be the one of the first variables stripped by automated data processing workflows. Additionally, the values should simply be in the set: {3,2,1,0,"A","B","Z"} or {3,2,1,0,-1,-2,-9}, rather than "location_quality= Z", etc...

@Antonarctica
Copy link

"location quality" is not part of the Standard darwincore terms (an overview here: https://dwc.tdwg.org/terms/).

A couple of options come to mind

with option 2 maybe being e good compromise

  1. There is a "location remarks" but that is as useful as "field notes".

  2. Another one is dynamicPropoerties which is a gathering bin but one that can be structured. It is format like this: {"heightInMeters":1.5}, {"tragusLengthInMeters":0.014, "weightInGrams":120}, {"natureOfID":"expert identification", "identificationEvidence":"cytochrome B sequence"}.

  3. Then there is measurement or fact (which should go in a separate extension normally).
    In measurement or fact you get
    measurementID
    measurementType
    measurementValue
    measurementAccuracy
    measurementUnit
    measurementDeterminedBy
    measurementDeterminedDate
    measurementMethod
    measurementRemarks

Not sure how other solved this.

@jdpye
Copy link
Member

jdpye commented Jan 16, 2020

Thank you for pulling this together, Maxime! I'll see if I have the right permissions to merge your fork into a demo/example subfolder here.

I know that we're encouraged to keep dynamicProperties sparse if we can at all help it, but I agree with option 2, and can see the value in designating a transient variable that'd only be available in certain subclasses of biologging location data. Option 1 is inviting ourselves to repurpose location remarks as dynamicPropertiesAboutLocations, probably nobody will like us for doing that!

Short of translating Argos location qualities into CoordinateUncertaintyInMeters, I don't know what else we'd do other than include something in DynamicProperties.

To completely convince myself, I'm going to poke around a few other example DwC occurrence archives in GBIF/OBIS that are using Argos for location data. So far the ones i've found have not included the quality info inline and have simply alluded to the fact that they 'filtered erroneous location data' in the archive-level metadata, so that's a bar that I think we can clear with your proposed solution.

@ianjonsen
Copy link

wrt Argos location data: option 3 has some merits as there are now different "flavours" of Argos location data that could be captured in the "measurementDeterminedBy" variables: 1) locations based on CLS Argos' old Least-Squares algorithm; 2) locations based on their Kalman Filter algorithm; 3) locations based on their Kalman Filter & Smoother algorithms (users have to pay additional fees for this and it's only available in post-processing, so I'd guess it's relatively rare). "measurementMethod" could be used to identify type of location data (Argos, GPS, GLS, ...), no?

In the case of "old" Least-Squares data, all you get is a "location quality" class for each observation. It is an index of Accuracy so could be capture by "measurementAccuracy". The Kalman Filtered and and Kalman Filtered & Smoothed flavours have "location quality" and error ellipse variables (Ellipse Semi-Major Axis, Ellipse Semi-Minor Axis, Ellipse Orientation). These are all important for modelling (location quality control and other applications).

I'd guess I'm preaching to the choir here, but... you would never want to archive/serve Argos data that had "erroneous location data" filtered or otherwise removed. I'd think you'd want to either provide filtered (or otherwise quality-controlled) location data as a separate, derived ("modelled" in the broadest sense) version of the data, or via a flag that indicates whether a record passed or failed the quality control process(es). I'd guess the metadata would have to capture the essentials of the quality control process applied. In the case of statistical quality control processes, e.g., state-space models - this is where CoordinateUncertaintyInMeters can be used to capture the estimated location uncertainty.

@Antonarctica
Copy link

@jdpye if memory serves me right the OBIS logic would be to throw it all in extended Measurement of Fact (lat long locations quality) have a simplified track or range polygon at the event level.

@ianjonsen The standardised vs filter discussion is on that comes back. Given that OBIS and GBIF mainly deal with primary observation, my feeling is that filtered data would be quite heavily processed and not really be the primary observation anymore. (also you ideally try to keep all of that close together.

Im happy with option 2 as an intermediate for now @msweetlove So we can do a first push. Based on how the discussion go we can always redo the export to GBIF/OBIS.

In any case with any approach for me the data on OBIS and GBIF would be a lead into discovering more detailed information which can be at movebank the AADC of another online repository. For instance for the Herring Gull data @peterdesmet used Zenodo.

@ianjonsen
Copy link

@Antonarctica yes, that makes sense - I knew I was wandering off into things beyond the primary observations

@peggynewman
Copy link

What about using some of the location class terms for the Argos location qualities? For example,

georeferenceProtocol and
georeferenceVerificationStatus?

The latter recommends use of a controlled vocabulary, which the Argos location quality essentially is.

@jdpye
Copy link
Member

jdpye commented Jan 21, 2020

The Argos LQs look to fit very neatly in those columns. We could set a good example with those.

@Antonarctica
Copy link

@peggynewman @jdpye @msweetlove
Seems they would be a good fit
so georeferenceProtocol would be 'ArgosLocations'?
and the controlled vocal
http://www.argos-system.org/manual/3-location/34_location_classes.htm

@ianjonsen
Copy link

Sounds like a great solution

@peggynewman
Copy link

Yes, something like that, although a sanity check @peterdesmet would be appreciated.
Eg. georeferenceProtocol is "Argos Location Class" plus a link to the 'vocab', and the values (0,1,2,3,A,B,Z) in georeferenceVerificationStatus.

Movebank have added Argos terms to their vocabulary in NERC and it only refers to the Argos 2011 manual but doesn't link to it. They have "Argos LC" which must be the label they use. In the absence of a proper vocabulary, a link out to the manual seems like the right thing to do.

@albenson-usgs
Copy link
Contributor

Just throwing a comment here to see what still needs to happen :-)

@Antonarctica
Copy link

I hope nothing... it is public now https://ipt.biodiversity.aq/resource?r=scar_raatd_trackingdata after some long time calculating on @msweetlove computer and finding some small errors. I guess we'll register it next week....

@jdpye
Copy link
Member

jdpye commented Sep 25, 2020

Yeah the last open ticket we had about eventDates looks to be fixed up in that DwC-A, I think this is good to go! is this PR's branch up to date?

@wardappeltans
Copy link
Contributor

And the RAATD dataset is also published in OBIS https://obis.org/dataset/48cb8624-a221-47ed-9a6d-b99b0bb394e0

@jdpye
Copy link
Member

jdpye commented Sep 25, 2020

Looks like Mirounga leonina still needs a scientificNameID, but not too many more is to dot and ts to cross, once we have the latest scriptlet and data example in the msweetlove:master branch.

@msweetlove
Copy link
Contributor

@jdpye all scientificNameIDs were collected from WoRMS in an automated loop. If the field is blank for a species, it means it had no exact match with the WoRMS database or there were multiple matches that could not be resolved automatically. I'll clean up the R-script and put it online today.

@msweetlove
Copy link
Contributor

the R-script for formatting the RAATD data is available here

@msweetlove
Copy link
Contributor

@jdpye I updated the occurrence file to add the scientificNameID of Mirounga leonina. The reason it was left blanc was due to multiple matches that could not be resolved automatically.

@jdpye
Copy link
Member

jdpye commented Oct 1, 2020

Thanks Max! I suspected it was something like that. I've had to parse the AcceptedStatus of the results sometimes to arrive at the one that's approved for my species. Some other times, there are still ambiguities and I have to do as you did. I'll review this now!

@jdpye
Copy link
Member

jdpye commented Oct 1, 2020

Is the updated file and workflow in the msweetlove fork's master branch?

@msweetlove
Copy link
Contributor

msweetlove commented Oct 2, 2020

The updated occurrences file can be found here: https://ipt.biodiversity.aq/resource?r=scar_raatd_trackingdata. To do this step I used just two trivial lines of code, so I didn't update the script for that.
It goes like this (with occurrences = the occurrence file):

condition<-occurrences[occurrences$scientificName=="Mirounga leonina",]
occurrences[condition,]$scientificNameID<-"urn:lsid:marinespecies.org:taxname:231413"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants