Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve netcdf multi-group reading and handle metadata QC issue #516

Open
wants to merge 23 commits into
base: development
Choose a base branch
from

Conversation

JessicaS11
Copy link
Member

@JessicaS11 JessicaS11 commented Feb 29, 2024

Per this discussion, ATL11 v006, ATL14 v003, and ATL15 v003 have metadata issues that cause icepyx to fail when it tries to get the version number from the granule file. This PR issues a temporary fix that instead gets the most recent version number from CMR to use instead.

It also adds handling reading in multiple groups of a netcdf.

Copy link

github-actions bot commented Feb 29, 2024

Binder 👈 Launch a binder notebook on this branch for commit 864ebb2

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 6c22caa

Binder 👈 Launch a binder notebook on this branch for commit 714642f

Binder 👈 Launch a binder notebook on this branch for commit 3fcddad

Binder 👈 Launch a binder notebook on this branch for commit ec3680b

Binder 👈 Launch a binder notebook on this branch for commit 8ec1f55

Binder 👈 Launch a binder notebook on this branch for commit c9d64fb

Binder 👈 Launch a binder notebook on this branch for commit 3e17251

Binder 👈 Launch a binder notebook on this branch for commit 4efbaeb

Binder 👈 Launch a binder notebook on this branch for commit 7e1f506

Binder 👈 Launch a binder notebook on this branch for commit 23ff7ad

Binder 👈 Launch a binder notebook on this branch for commit 87ee63a

Binder 👈 Launch a binder notebook on this branch for commit b777a7f

Binder 👈 Launch a binder notebook on this branch for commit 1111e80

Binder 👈 Launch a binder notebook on this branch for commit ce54757

Binder 👈 Launch a binder notebook on this branch for commit d23d1a0

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@JessicaS11
Copy link
Member Author

JessicaS11 commented Mar 5, 2024

@rwegener2 @betolink @andypbarrett @scottyhq @wsauthoff @tsutterley @jpswinski
Before reporting to xarray, I'm hoping one of you can check I'm not doing something silly. Per this toy example (also in the metadata branch that this PR is for, but easier to view not as a GitHub diff), when I read an ATL15 file in using Xarray (xr_open_dataset()) in the cloud versus locally, the resulting structure of the dataset is different. Namely, in the cloud the variables include the time dimension, and locally they do not. I initially noted it via the icepyx.Read functionality (bottom half of the notebook), but reproduced it with xarray directly (top half of the notebook). I've tried controlling for versions and defaults, but not yet with a fine tooth comb. I know suspect the number of x and y values differ due to the granule being ordered with subsetting (I thought I set it to false, but...), but that shouldn't make a difference anyway. Have any of you seen this?

@tsutterley
Copy link
Member

tsutterley commented Mar 5, 2024

hey @JessicaS11, is ATL15_GL_0318_01km_003_01_HEGOUT.nc a renamed version of the ATL15 granule? Or is a subset file from the NSIDC API? I think NSIDC is still working on 3D raster subsetting capability.

@JessicaS11
Copy link
Member Author

hey @JessicaS11, is ATL15_GL_0318_01km_003_01_HEGOUT.nc a renamed version of the ATL15 granule? Or is a subset file from the NSIDC API? I think NSIDC is still working on 3D raster subsetting capability.

That's what came back when I ordered and downloaded from NSIDC:

region = ipx.Query(
         product="ATL15",
         spatial_extent=[-17.25, 80.7, -16.0, 81.0],  # minlon, minlat, maxlon, maxlat
         date_range=['2018-09-15', '2023-03-02']
     )

     region.download_granules(path="./doc/source/example_notebooks/atl15",
                              subset=False)

I think NSIDC is still working on 3D raster subsetting capability.

That's also what I thought... my naive (and "potentially an issue" assumption) was that even if the file were being subset spatially, it shouldn't be affecting the structure and thus how xarray reads in a specific group.

@betolink
Copy link
Contributor

betolink commented Mar 6, 2024

just to confirm, @JessicaS11 the local granule was downloaded using icepyx?

if it was downloaded using the on-prem subsetting service maybe there is a bug in EGI (or an undocumented behavior) that removed the time dimension even if we are not subsetting the granule.

@JessicaS11
Copy link
Member Author

just to confirm, @JessicaS11 the local granule was downloaded using icepyx?

Yes - with subsetting applied. (So subsetting on rasters is available now, @tsutterley!)

Submitting to EGI without subsetting params returned ftp downloadUrls (which icepyx isn't set up to handle, since it looks for the order ID). To initiate the unprocessed granule download (via a url request in the browser), I had to set agent=NO and mode=sync, at which point I assume it's not going through EGI. Opening the full granule with xarray behaved just as on the cloud, with the variables correctly formatted with x, y and time dimensions.

if it was downloaded using the on-prem subsetting service maybe there is a bug in EGI (or an undocumented behavior) that removed the time dimension even if we are not subsetting the granule.

@mikala-nsidc, @betolink suggested you might know what's going on (or someone who does) such that somehow EGI is restructuring the data when it's subsetted to turn the time dimension into bands (as interpreted by xarray).

@mikala-nsidc
Copy link

mikala-nsidc commented Mar 7, 2024

@JessicaS11 @betolink @tsutterley The way that the on prem subsetter handles ATL15 parameters that are 3 dimensional arrays is by "flattening" those bands. This means, for instance, that if you select a spatial subset of an ATL15 file, the time dimension in dh/dt will be separated into 18 bands. Attaching a screenshot to illustrate.
Screenshot 2024-03-07 at 10 25 34 AM
Spatial subsetting will be handled differently once those services are available in the cloud. The code will be developed to better handle 3D arrays.
The flattening method was our only option for the on prem subsetter.

@JessicaS11
Copy link
Member Author

Thanks, @mikala-nsidc! It's good to know that's intended behavior. Is there a verifiable mapping of bands to timestamps? I don't see anything in the metadata for a given band to indicate which timestamp it corresponds to.

@JessicaS11 JessicaS11 changed the title improve ATL15 parsing and handle metadata QC issue improve netcdf multi-group reading and handle metadata QC issue Mar 15, 2024
@JessicaS11
Copy link
Member Author

@wsauthoff @mikala-nsidc Are either of you available to review this PR?

@JessicaS11 JessicaS11 marked this pull request as ready for review March 15, 2024 20:41
Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 13.63636% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 66.27%. Comparing base (cc02758) to head (d23d1a0).

Files Patch % Lines
icepyx/core/read.py 0.00% 15 Missing ⚠️
icepyx/core/is2ref.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           development     #516      +/-   ##
===============================================
- Coverage        66.59%   66.27%   -0.32%     
===============================================
  Files               36       36              
  Lines             3059     3072      +13     
  Branches           534      537       +3     
===============================================
- Hits              2037     2036       -1     
- Misses             934      948      +14     
  Partials            88       88              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# see: https://forum.earthdata.nasa.gov/viewtopic.php?t=5154
# (v006, v003, v003, respectively)
# Note that this results in a login being required even for a local file
# because otherwise Variables tries to get the version from the file (ln99).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 99 doesn't seem to do this, unless this is a reference to a different notebook or py script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 99-103 of the Variables module are:

            # Check for valid version string
            # If version is not specified by the user assume the most recent version
            self._version = val.prod_version(
                is2ref.latest_version(self._product), version
            )

I'll update the comment to make it clearer that it's a different file.

icepyx/core/read.py Outdated Show resolved Hide resolved
@JessicaS11 JessicaS11 requested a review from wsauthoff May 29, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants