Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine warnings on get_gbif_taxonomy #39

Open
mikeroswell opened this issue Sep 14, 2020 · 4 comments
Open

refine warnings on get_gbif_taxonomy #39

mikeroswell opened this issue Sep 14, 2020 · 4 comments
Labels

Comments

@mikeroswell
Copy link

Loving get_gbif_taxonomy so far.

I just ran it on a list my colleagues maintain of about 20,000 "valid" names of hymenopteran species. About 16 came back with the warning " Selected first of multiple equally ranked concepts!". Of these, the majority meet the following condition: scientificName == scientificNameStd. However, the ones that do not (at the treshold I used) seem likely to be mis-matched. It would be super helpful, I think, to provide a different warning on these two cases, as when going through and manually checking results, it's great to have warnings in cases where the automation probably worked, but it's also nice to be able to focus easily on the ones most likely to be a problem.

Thanks!

@mikeroswell
Copy link
Author

e.g. here is a place I would want a "louder" warning: I search "Leioproctus carinatus" (a valid species name) and the selected sp. was "capillatus," which I believe is a separate but also valid species name. This is different from "Hoplitis rubicrus" which matches with itself... the warning is still helpful but I'm less concerned in the case that the identical species name is matched with itself than if a totally different species matches :-)

@fdschneider
Copy link
Member

Thanks for the feedback. More specific and louder warnings as well as possibilities to interact directly with the function would be great. As you probably figured, the problem is caused by fuzzy matching producing a match with the wrong valid taxon. If using option fuzzy = FALSE, the mismatch should be avoided.

I considered switching off fuzzy matching by default, but misspellings are very frequent in data and would not be addressed otherwise (see #38).

The function get_gbif_taxonomy() is essential for the package, as it provides a quick mapping of taxa to GBif Backbone Taxonomy following logical rules of-thumb. Figuring out all possible matching errors is tedious, so please keep posting those here. Unfortunately, improving the function is not the core focus of the work right now, as I'm also hoping for a more general way of implementing taxon mapping (e.g. also providing a choice of the reference taxonomy).

@mikeroswell
Copy link
Author

Oh, I definitely think fuzzy matching is desirable (That's why I'm using your function!). I just think it would be nice to distinguish the two cases as when i do this in a minute with 600,000 rows of hand-entered data, I'm going to have a lot of warnings but really want to focus on the ones that are likely going to mess me up. Thanks!

@mikeroswell
Copy link
Author

Another place the warnings could be clearer: I have a mix of valid and invalid names that are receiving a mix of these two warnings: warnings == " No match! Check spelling or lower confidence threshold!" and warnings == "No matching species concept! " . Would be helpful to know what triggers one warning vs. the other... for me these warnings are guides to troubleshooting mismatches, not simply criteria for filtering. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants