Rewrite cell/swath xarray readers as MultiFileHandlers #64

claytharrison · 2024-04-24T12:25:36Z

This pull request aims to reimplement the reading/merging logic for swath and cell files in the structure established by MultiFileHandler/ChronFiles/etc in the file_handling module.

On this commit, readers for cell files are implemented (RaggedArray and OrthoMulti). The most basic method of operation goes something like:

from ascat.read_native.cell_collection import RaggedArrayFiles, OrthoMultiArrayFiles
contiguous_ra_source = "/path/to/contiguous/sig0_12.5/metop_a"
indexed_ra_source = "/path/to/indexed/sig0_12.5/metop_a"
multisat_ra_source = "/path/to/indexed/sig0_12.5/"
orthomulti_source = "/path/to/era5_land_2023/"
orthomulti_grid = "/path/to/era5_land_2023/grid.nc"

# amazon chunk
# you can also query by list of location_id, cell number, or lon/lat coords
bbox = (-7, -4, -69, -65)

contiguous_ra_files = RaggedArrayFiles(contiguous_ra_source, product_id="sig0_12.5") 
indexed_ra_files = RaggedArrayFiles(indexed_ra_source, product_id="sig0_12.5")

# right now we just use the "all_sats" parameter to indicate if the files are nested within metop_a/metop_b/metop_c directories underneath
# the root dir. This is of course not general or ideal.
multisat_ra_files = RaggedArrayFiles(multisat_ra_source, product_id="sig0_12.5", all_sats=True)

# for orthomulti right now you just pass the grid file path as an argument and it will generate a pygeogrids object from that.
# the product_id doesn't do anything in this case.
orthomulti_files = OrthoMultiArrayFiles(orthomulti_source, product_id="this_doesnt_matter_in_this_case", grid=orthomulti_grid)

# extract the data

contiguous_ra_ds = contiguous_ra_files.extract(bbox=bbox)
indexed_ra_ds = indexed_ra_files.extract(bbox=bbox)
# ^ these two should be the same, since contiguous RAs are converted to indexed before merging

multisat_ra_ds = multisat_ra_files.extract(bbox=bbox)

orthomulti_ds = orthomulti_files.extract(bbox=bbox)

To do:

~~Add swath file reader~~ Finish swath file reader
Find a robust method of handling product-specific information like grids, etc., including a way for users to provide that themselves. For the cell reader we only really need to pass the grid, but for the swath reader this will get more complicated
Add ability to write out according to different cell scheme (any cell scheme)
Try integration with regrid applications, make sure that still works nicely.
Rename things better
whatever else is missing compared to the old version

claytharrison · 2024-04-25T13:18:33Z

I added a basic Swath reader but nothing for handling specific products yet. For now you can steal the information for a given product from xarray_io.py.

It tries to implement a spatial filter for the results of the time-based file search, to relatively quickly exclude unnecessary swath files from reading and merging. The concept was graciously stolen from a script of Pavan's. It seems like it works but I haven't done proper testing yet.

Using it should go something like -

from ascat.read_native.swath_collection import SwathFile
from ascat.read_native.swath_collection import SwathGridFiles
from fibgrid.realization import FibGrid

swath_path = "tests/ascat_test_data/hsaf/h129/swaths"
grid = FibGrid(6.25)
sf = SwathGridFiles(
    swath_path,
    cls=SwathFile,
    fn_templ="W_IT-HSAF-ROME,SAT,SSM-ASCAT-METOP{sat}-6.25-H129_C_LIIB_{date}_{placeholder}_{placeholder1}____.nc",
    sf_templ={"year_folder": "{year}"},
    grid=grid,
    fn_read_fmt= lambda timestamp: {
        "date": timestamp.strftime("%Y%m%d*"),
        "sat": "[ABC]",
        "placeholder": "*",
        "placeholder1": "*"
    },
    sf_read_fmt = lambda timestamp:{
        "year_folder": {
            "year": f"{timestamp.year}"
        },
    },
)
files = sf.search_period(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    date_field_fmt="%Y%m%d%H%M%S"
)
bbox=(-90, -4, -70, 20)

merged_ds = sf.extract(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    bbox = bbox,
    date_field_fmt="%Y%m%d%H%M%S"
)

Clay Harrison added 6 commits May 11, 2024 15:05

Add MultiFileHandler cell file readers

15ce19c

Fix sneaky syntax error

d6f0859

Add util for getting various gpi-sets from grid

abe356b

Rename cell read method to extract

73381e7

Reformat test data generator and add swaths

7d8df97

Add ChronFiles-based swath reader + tests

c16a68d

sebhahn force-pushed the xarr_refactor branch from 8b4ad31 to c16a68d Compare May 11, 2024 13:06

sebhahn and others added 13 commits May 11, 2024 15:10

add pyresample to dependencies

26618ae

add numba

5e4f1ed

Remove numba

87dfb08

add extra options to get_grid_gpis

646464f

add grid cache and ASCAT products to product_info

1deaa1f

Add spatial functionality to swath_collection

185b450

adapt aggregators to use new swath_collection

96c0ba8

ignore profiling

e1e5fac

Start adding spatial funcs to cell_collection

26fddba

Start on swath/cell xarray accessors

aedfdf2

Fix orthomulti tests

342eca9

Fix swath tests

342c1a5

Resolve conflict with master

5d4c583

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite cell/swath xarray readers as MultiFileHandlers #64

Rewrite cell/swath xarray readers as MultiFileHandlers #64

claytharrison commented Apr 24, 2024 •

edited

claytharrison commented Apr 25, 2024 •

edited

Rewrite cell/swath xarray readers as MultiFileHandlers #64

Are you sure you want to change the base?

Rewrite cell/swath xarray readers as MultiFileHandlers #64

Conversation

claytharrison commented Apr 24, 2024 • edited

claytharrison commented Apr 25, 2024 • edited

claytharrison commented Apr 24, 2024 •

edited

claytharrison commented Apr 25, 2024 •

edited