Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What or who is the agent? #29

Open
dshorthouse opened this issue Sep 15, 2020 · 15 comments
Open

What or who is the agent? #29

dshorthouse opened this issue Sep 15, 2020 · 15 comments

Comments

@dshorthouse
Copy link

dshorthouse commented Sep 15, 2020

I see that scientificName is a required term. But I do not see either recordedBy or identifiedBy. I assume that many determination events will be executed by a human sometime after a captured event but that other determination events will happen in near real-time by a trained machine. Should you require scientificName without also requiring an agent who/that made the assertion? Is this meant to be captured in samplingProtocol and will content there be sufficiently machine-readable so as to differentiate determinations made by a human from those made by a machine?

@jdpye
Copy link
Member

jdpye commented Sep 15, 2020

A great point. For the great majority of my own experience, the determination was made by the individual doing the tagging but we are certainly not being as explicit as we could be! samplingProtocol also feels wrong for this, I had envisioned using that field to describe things like the tag attachment method or any specifics of which animals were being selected for study (if your study was only on juvenile males, for example). I am warm to the idea of using identifiedBy, it seems like the exact thing for this purpose. The guidance seems to be to use names, but I see no reason why algorithms, or even a 'machine determined' string couldn't work.

@albenson-usgs
Copy link
Contributor

albenson-usgs commented Sep 15, 2020

The primary way we are differentiating human observations versus machine observations is by using basisOfRecord. See the explanation for basisOfRecord in the wiki "A single record in the Occurrence Core will designate basisOfRecord:HumanObservation to delineate the point in time that the animal was in hand. Future detections of the animal by stations will be designated basisOfRecord:MachineObservation. This is one of the primary ways (combined with organismID) a user would know that the multiple observations are actually multiple detections of the same animal.:

@dshorthouse
Copy link
Author

@jdpye @albenson-usgs Gotcha re: basisOfRecord. Might you also want a more explicit way to state who the human was or what was the machine that made the later assertions, though these machine-based assertions feel like splitting hairs.

@Antonarctica
Copy link

and basisofrecord is also a required field. but I always try to convince people to fill out identifiedby by the person or algorithm that did the identification.

@albenson-usgs
Copy link
Contributor

albenson-usgs commented Sep 15, 2020

@dshorthouse I guess I'm not sure that someone would NEED to know who captured the animal for conducting downstream analyses. I do agree it's nice to have if you can get it but I don't know that not having it prevents future work from happening? I'm not a biologging expert though so hopefully @peggynewman or maybe @sarahcd can confirm. I can see how you might need to know what machine made the observation because it might help with uncertainty (maybe?). But that information might be better laid out in something like sampling protocol.

@albenson-usgs
Copy link
Contributor

Actually after considering this further, I could see the case for making identifiedBy strongly recommended and using it the way @jdpye suggested might help make it even clearer which observations are human ones and which are machine ones.

@peggynewman
Copy link

Interesting. I agree with the approach that recordedBy and identifiedBy in biologging data goes alongside the Human Observation record and that broadly our approach is to group by organismId. We're likely to see these fields used more in repositories thanks to the kind of work that @dshorthouse is doing with Binomia. For biologging however it's the machine that's doing the observing, not recording or identifying, then that information doesn't belong in those fields. We are describing the machine capture mechanisms in the Event and MoF.
An interesting differentiation might be a camera trapping project, which would have a separate process of recording then identification.

@dshorthouse
Copy link
Author

@peggynewman In the context of biologging, it doesn't make much sense to ascribe credit for effort as might be assumed in the spirit of recordedBy or identifiedBy, which is (partly) what Bionomia tries to accomplish. However, the other intent of these terms is a bit more subtle. If we accept that the identity of an occurrence is subject to external slippage in taxon concept, then we need a safeguard to confidently assert alignment with a future concept. A naked scientificName without a corresponding statement of what resource was used (or who/what identified it as such) at a particular time and place will experience an intractable dissociation from future taxon concepts. I assume that biologging data has implications for conservation policy now and long into the future, but taxon concepts themselves are a moving target. In the majority of cases, I'm willing to bet that organisms that are tracked in your projects have relatively stable taxon concepts and there isn't much conflict. What I write here is undoubtedly overkill and immaterial...but this is true only for our present, small window of time.

@Antonarctica
Copy link

Taking from other best practices. A 'scientficName' should be linked to a 'scientificNameID' which is defined as: An identifier for the nomenclatural (not taxonomic) details of a scientific name. This gives some protection against slippage eg in case the scientific name the accepted and unaccepted names can be linked. The best practice is to have a globally unique identifier for instance a Life Sciences Identifier (LSID). For the marine species we use the World Register of Marine Species. for instance Aptenodytes forsteri Gray, 1844
can be found here
http://marinespecies.org/aphia.php?p=taxdetails&id=225773
the id at the and is the AphiaID that WoRMS uses and it matches this lsid urn:lsid:marinespecies.org:taxname:225773
other taxonomic backbones can be used.

@jdpye
Copy link
Member

jdpye commented Sep 17, 2020

Oh we've had a couple 'fun' taxon shakeups with manta rays and Atlantic torpedo/tetronarce, as well as some ambiguous identifiers with things like sixgill/sevengill sharks. At my institution we run things through marinespecies.org and back to the researcher with any discrepancies from their field reporting, and we identify the marinespecies.org entries as the authority as @Antonarctica has detailed. (We also track cases where the researcher is adamant that the taxonomic database has it wrong, though I don't know what to do with this information yet!) This grants us some ability to crossreference via TSN and AphiaID, now and in the future.

@albenson-usgs
Copy link
Contributor

@jdpye reach out to WoRMS on the cases where the researcher is not in agreement (info at marinespecies.org). They are really responsive and helpful.

@jdpye
Copy link
Member

jdpye commented Sep 17, 2020

Definitely. they're great at accepting new colloquial names, and I feel like the marine/brackish/fresh distinctions are maybe a little bit my fault because I made them add American alligators once upon a time.

@peggynewman
Copy link

I agree, a scientificNameID belongs with scientificName. In the situation where an algorithm has provided the species identification, I've been thinking that is more MoF lines. Is a persistent identifier for an algorithm a DOI on a publication or are there other options?

@danstowell
Copy link

Hi all. I'm trying to choose a clear way to indicate which software algorithm provided a taxon ID. I agree with a comment above by @jdpye that "identifiedBy" seems appropriate, though it would need its definition changing to encompass machines (not just people or groups) as the agent. On this question, there's a lot more discussion in the Attribution group here: tdwg/attribution#38

@jdpye
Copy link
Member

jdpye commented Feb 18, 2021

That is a great discussion, @danstowell , thanks for that link! I think we could potentially 'get away with' a lot because of the 'freetext identifiers separated by pipes' nature of the field in HumanObservations, but I'd love to see the MachineObservation side of things make use of that field. If we did that, you're absolutely right, definitions would need a bit of updating. For algorithms/implementations of identifying software in order to be complete we'd be looking to record a program name/version number, or better, a git URI and commit hash.

What I haven't done yet, and can do, is look through some of the other tdwg communities and their conversations to find out what other determinations have been handed down on this specific subject in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants