Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatypes changed when using .nc download #89

Open
callumrollo opened this issue Feb 22, 2023 · 5 comments
Open

Datatypes changed when using .nc download #89

callumrollo opened this issue Feb 22, 2023 · 5 comments

Comments

@callumrollo
Copy link

It appears that variables stored in ERDDAP as integers of various sizes are converted to float32 when exported as netCDF.

Take this example dataset

https://erddap.observations.voiceoftheocean.org/erddap/tabledap/nrt_SEA068_M27.html

The variable conductivity_qc is a qc variable that can only have the value of an integer between 1 and 9. As such, we have specified it within ERDDAP as a byte, an 8 bit integer

  conductivity_qc {
    Byte _FillValue 127;
    String _Unsigned "false";
    Byte actual_range 1, 4;
    Float64 colorBarMaximum 10.0;
    Float64 colorBarMinimum 0.0;
    String comment "Quality control flags from IOOS QC QARTOD https://github.com/ioos/ioos_qc Version: 2.1.0. Using config: [<Call stream_id=conductivity function=qartod.gross_range_test(suspect_span=[6, 42], fail_span=[3, 45])>, <Call stream_id=conductivity function=qartod.location_test(bbox=[10, 50, 25, 60])>].  Threshold values from EuroGOOS DATA-MEQ Working Group (2010) Recommendations for in-situ data Near Real Time Quality Control [Version 1.2]. EuroGOOS, 23pp. DOI https://dx.doi.org/10.25607/OBP-214.";
    String flag_meanings "GOOD, UNKNOWN, SUSPECT, FAIL, MISSING";
    Float64 flag_values 1, 2, 3, 4, 9;
    String ioos_category "Quality";
    String ioos_qc_module "qartod";
    String long_name "quality control flags for water conductivity";
    String quality_control_conventions "IOOS QARTOD standard flags";
    Float64 quality_control_set 1;
    String standard_name "sea_water_electrical_conductivity_flag";
    Byte valid_max 9;
    Byte valid_min 1;
  }

However, when this datasets is downloaded as a netCDF this variable, and all others, have been converted to float32. I believe this produces a substantial and avoidable increase in download size.

This issue does not occur appear to occur with export as .csv, as the integers are exported as such, not as floating points.

@BobSimons
Copy link
Collaborator

That shouldn't happen. But as you'll see below, I can't reproduce the problem.

I think the first step to solve this problem is to see what data type is used for that variable in that dataset.
If I go to https://erddap.observations.voiceoftheocean.org/erddap/tabledap/nrt_SEA068_M27.html and hover over the (?) icon by conductivity_qc, I see that the data type is indeed "Byte". Good.

So then I made a request for a .nc file with this URL
https://erddap.observations.voiceoftheocean.org/erddap/tabledap/nrt_SEA068_M27.nc?latitude%2Clongitude%2Ctime%2Cconductivity%2Cconductivity_qc&time%3E=2022-07-31T00%3A00%3A00Z&time%3C=2022-07-31T03%3A51%3A42Z

I downloaded that file and renamed it voto.nc.

I then used ncdump -h to see what is in the file. It showed (just the part for conductivity_qc):
byte conductivity_qc(row=243);
:_FillValue = 127B; // byte
:actual_range = 1B, 1B; // byte
:colorBarMaximum = 10.0; // double
:colorBarMinimum = 0.0; // double
:comment = "Quality control flags from IOOS QC QARTOD https://github.com/ioos/ioos_qc Version: 2.1.0. Using config: [<Call stream_id=conductivity function=qartod.gross_range_test(suspect_span=[6, 42], fail_span=[3, 45])>, <Call stream_id=conductivity function=qartod.location_test(bbox=[10, 50, 25, 60])>]. Threshold values from EuroGOOS DATA-MEQ Working Group (2010) Recommendations for in-situ data Near Real Time Quality Control [Version 1.2]. EuroGOOS, 23pp. DOI https://dx.doi.org/10.25607/OBP-214.";
:flag_meanings = "GOOD, UNKNOWN, SUSPECT, FAIL, MISSING";
:flag_values = 1.0, 2.0, 3.0, 4.0, 9.0; // double
:ioos_category = "Quality";
:ioos_qc_module = "qartod";
:long_name = "quality control flags for water conductivity";
:quality_control_conventions = "IOOS QARTOD standard flags";
:quality_control_set = 1.0; // double
:standard_name = "sea_water_electrical_conductivity_flag";
:valid_max = 9B; // byte
:valid_min = 1B; // byte
So the variable is still stored as a byte in the .nc file that I downloaded.

So I have no explanation for why you see float32.
Are you perhaps looking at the conductivity (not qc) variable?
Did you use ncdump to determine the data type for the variable in the .nc file? (If not, why do you think it is a float32?)
Could you please tell me the exact URL you used to download the data (so I can reproduce the problem)?

Best wishes.

@callumrollo
Copy link
Author

Hi Bob,

Thanks for checking this. You're right, the issue lies in Python xarray's treatment of integers and fill values, not in ERDDAP. I'll use ncdump to check downloaded files in future. I will migrate this issue to erddapy, as the default behavior of xarray is converting integer arrays to float32 at read.

@callumrollo
Copy link
Author

@BobSimons I'm getting a security policy rejection with your NOAA email 550 5.7.1 unrecognized address. Looks like I may need to be allowlisted . You can contact me at c.rollo@outlook.com

@BobSimons BobSimons reopened this Feb 23, 2023
@ocefpaf
Copy link

ocefpaf commented Feb 23, 2023

I will migrate this issue to erddapy, as the default behavior of xarray is converting integer arrays to float32 at read.

I saw that problem a few years back but it was gone when the data provider updated their ERDDAP server. I guess that this is a new problem. However, the issue is with xarray and/or maybe the libnetcdf version. So there isn't much we can do in erddapy b/c we just read the downloaded file directly with NetCDF4DataStore.

@BobSimons BobSimons reopened this Feb 23, 2023
@BobSimons
Copy link
Collaborator

@ocefpaf, can you report the bug to the maintainers of NetCDF4DataStore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants