New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide conversion utility that could create a SIMPLE_CSV format file from the more complete DWCA format download #121
Comments
I believe such thing is outside of the scope of this package. Also there is a Python package to manipulate a DWC Archive python-dwca-reader. |
The default download format for occurrences appears to be SIMPLE_CSV: pygbif/pygbif/occurrences/download.py Line 69 in 9590fcf
|
Indeed, the current default parameter is A major challenge when working with a DWCA is that this format is not always consistent. Many times, there isn't a common way to map extra tables to one table. For instance, in the case of a multimedia table, you can expect the I believe that the |
My request is about occurrences only and relates only to DWCA downloads originating from GBIF (pygbif is about facilitating access to the GBIF API from Python). |
No, I'm not. Also I am not associated with GBIF. |
Hi Nicky, There's an experimental (not necessarily stable, not documented) API for the columns returned in GBIF downloads:
The @CecSve is maintaining pygbif, but is on leave until the end of January. |
Thanks @MattBlissett - if / when this becomes stable it might be good to consider making it available from the pygbif library. |
My mistake, |
Assuming that a SIMPLE_CSV format download is a subset of the DWCA download - as per the GBIF download FAQ entry:
... then it would be good to have a utility that could create a SIMPLE_CSV format file from the larger DWCA.
Rationale: - a user develops a script that needs only minimal data and therefore is designed to operate on SIMPLE_CSV format input. Another user of the script has a pre-existing DWCA format download and wants to use this as input to the script (without having to create another download) - so they need a way to slim down the DWCA to the SIMPLE_CSV set of fields.
Is there a GBIF metadata service that returns the fieldnames used in each of these download formats which could help? If so, pygbif could provide access to this and a column rename mapping (if required).
The text was updated successfully, but these errors were encountered: