Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difficulty reading CF compliant files #135

Open
gaelforget opened this issue Feb 28, 2020 · 14 comments
Open

difficulty reading CF compliant files #135

gaelforget opened this issue Feb 28, 2020 · 14 comments

Comments

@gaelforget
Copy link
Member

After loading one of my files via Panoply to verify that there was nothing wrong with it (see below) I tried the model = load(gcm_files, "tasmax", poly=poly_reg) example and got ERROR: Manually verify x/lat dimension name.

Taking a look in the code I see that getdim_lat relies on a list of hard coded names. I thought that the more general approach was to rely on long_name + units. Not sure what to suggest -- adding to the hard coding list would be a short term fix just for me...

  lon_c   (720)
    Datatype:    Float64
    Dimensions:  lon_c
    Attributes:
     units                = degrees_east
     long_name            = longitude

Screen Shot 2020-02-28 at 4 03 53 PM

@gaelforget
Copy link
Member Author

Also, the next file I am planning to present to climatetools is also CF-compliant but not on a regular lat-lon grid (see below). But I am going to wait a bit before I try that.

Screen Shot 2020-02-28 at 4 18 42 PM

@Balinus
Copy link
Member

Balinus commented Feb 28, 2020

Thanks for the input! Indeed, this is certainly not an elegant function. From memory, this was coded for a project that involved regional climate models (your second case).

Not sure if the extraction of lon_c based on long_name is robust though. Seems more robust to go with the detected dimensions. For instance, for a regional climate model, the dimension will not have longitude as their dimension. They will have a longitude grid though, with the long_name being longitude. If I rely on detecting say longitude, we will extract the longitude grid and not the native dimension which could be meters, degrees on a stereographic grid, etc...

Open to suggestions though as hardcoding this is not a robust solution either.

@gaelforget
Copy link
Member Author

gaelforget commented Mar 4, 2020

Open to suggestions though as hardcoding this is not a robust solution either.

Cool. Will take a deeper look and might send PR later if I find a way to improve code

regional climate models (your second case)

Just to clarify, I use sets of these files that collectively add up to global model variables

@Balinus
Copy link
Member

Balinus commented Mar 9, 2020

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

@lmilechin
Copy link

Just for reference: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#latitude-coordinate

From what I've seen with other tools, they detect dimensions using the units, which is what the CF Conventions seems to imply as well.

@Balinus
Copy link
Member

Balinus commented Mar 10, 2020

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

I'm gonna rework this extraction part asap.

@gaelforget
Copy link
Member Author

gaelforget commented Mar 10, 2020

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

As highlighted by @lmilechin it is the units attribute that should be used to identify coordinates per the CF guidelines -- as opposed to standard_name which is only optional and e.g. does not distinguish between different longitude conventions

I'm gonna rework this extraction part asap.

Great! Thanks

@Balinus
Copy link
Member

Balinus commented Mar 10, 2020

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

@gaelforget
Copy link
Member Author

gaelforget commented Mar 10, 2020

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

@gaelforget
Copy link
Member Author

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

Yes -- one tile = 1 file in this example

@Balinus
Copy link
Member

Balinus commented Mar 11, 2020

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

Thanks, I was able to produce the files at home.

@Balinus
Copy link
Member

Balinus commented Mar 11, 2020

Also, re-read the thread and wanted to clarify: when I spoke about "dimension" I was mostly referring to the dimensions of the datasets, not the units/measure of the variable itself. Hence, the need to distinguish between a rotated latitude "dimension" versus the latitude grid (a variable in the dataset, not the one of the dimension) of a datasets for projected grids.

Anyway, I'll be forced to think about a more general solution to this!

edit - For example, for this dataset, there is rlat and rlon.

Dimensions
   rlat = 412
   rlon = 424
   time = 2920
   bnds = 2

Variables
  lat   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = latitude
     long_name            = latitude
     units                = degrees_north

  lon   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = longitude
     long_name            = longitude
     units                = degrees_east

  pr   (424 × 412 × 2920)
    Datatype:    Float32
    Dimensions:  rlon × rlat × time
    Attributes:
     grid_mapping         = rotated_pole
     _FillValue           = 1.0e20
     missing_value        = 1.0e20
     standard_name        = precipitation_flux
     long_name            = Precipitation
     units                = kg m-2 s-1
     coordinates          = lon lat
     cell_methods         = time: mean

  rlat   (412)
    Datatype:    Float64
    Dimensions:  rlat
    Attributes:
     standard_name        = grid_latitude
     long_name            = latitude in rotated pole grid
     units                = degrees
     axis                 = Y

  rlon   (424)
    Datatype:    Float64
    Dimensions:  rlon
    Attributes:
     standard_name        = grid_longitude
     long_name            = longitude in rotated pole grid
     units                = degrees
     axis                 = X

@Balinus
Copy link
Member

Balinus commented Mar 13, 2020

I've sketched some code in #137

It's pretty rough right now but so far it works. Just not sure about the robustness though. Haven't had the time to test your files @gaelforget but I'm pretty sure it does not work. I'm currently testing for axis (optional attribute in CF files) and standard_name attributes of the dimensions. Will add long_name later.

@Balinus
Copy link
Member

Balinus commented Mar 14, 2020

@gaelforget In the files produced by the Notebook, both lat_c and lon_c has a longitude attribute as their long_name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants