Regression with Zarr: ReadOnlyError #135

rabernat · 2023-02-20T15:31:36Z

Tests with the latest dev environment are failing with errors like this


tmp_path = PosixPath('/private/var/folders/kl/7rfdrpx96bb0rhbnl5l2dnkw0000gn/T/pytest-of-rabernat/pytest-69/test_rechunk_group_mapper_temp7')
executor = 'python', source_store = 'mapper.source.zarr', target_store = <fsspec.mapping.FSMap object at 0x1174e3520>
temp_store = <fsspec.mapping.FSMap object at 0x1174e3400>

    @pytest.mark.parametrize(
        "executor",
        [
            "dask",
            "python",
            requires_beam("beam"),
            requires_prefect("prefect"),
        ],
    )
    @pytest.mark.parametrize("source_store", ["source.zarr", "mapper.source.zarr"])
    @pytest.mark.parametrize("target_store", ["target.zarr", "mapper.target.zarr"])
    @pytest.mark.parametrize("temp_store", ["temp.zarr", "mapper.temp.zarr"])
    def test_rechunk_group(tmp_path, executor, source_store, target_store, temp_store):
        if source_store.startswith("mapper"):
            fsspec = pytest.importorskip("fsspec")
            store_source = fsspec.get_mapper(str(tmp_path) + source_store)
            target_store = fsspec.get_mapper(str(tmp_path) + target_store)
            temp_store = fsspec.get_mapper(str(tmp_path) + temp_store)
        else:
            store_source = str(tmp_path / source_store)
            target_store = str(tmp_path / target_store)
            temp_store = str(tmp_path / temp_store)
    
>       group = zarr.group(store_source)

tests/test_rechunk.py:457: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../mambaforge/envs/rechunker/lib/python3.9/site-packages/zarr/hierarchy.py:1355: in group
    init_group(store, overwrite=overwrite, chunk_store=chunk_store,
../../../mambaforge/envs/rechunker/lib/python3.9/site-packages/zarr/storage.py:648: in init_group
    _init_group_metadata(store=store, overwrite=overwrite, path=path,
../../../mambaforge/envs/rechunker/lib/python3.9/site-packages/zarr/storage.py:711: in _init_group_metadata
    store[key] = store._metadata_class.encode_group_metadata(meta)  # type: ignore
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <zarr.storage.FSStore object at 0x1174e34c0>, key = '.zgroup', value = b'{\n    "zarr_format": 2\n}'

    def __setitem__(self, key, value):
        if self.mode == 'r':
>           raise ReadOnlyError()
E           zarr.errors.ReadOnlyError: object is read-only

../../../mambaforge/envs/rechunker/lib/python3.9/site-packages/zarr/storage.py:1410: ReadOnlyError
==

This is the cause of the test failures in #134.

The text was updated successfully, but these errors were encountered:

rsignell-usgs · 2023-03-14T21:31:02Z

Shoot, I'm still getting the read_only errors with 0.5.1:
https://nbviewer.org/gist/85a34aed6e432d0d8502841076bbab92

rabernat · 2023-03-14T21:36:37Z

I think you may be hitting a version of zarr-developers/zarr-python#1353 because you are calling

m = fs.get_mapper("")

Try updating to the latest zarr version, or else creating an FSStore instead.

rsignell-usgs · 2023-03-14T21:37:10Z

Okay, will do!

rabernat · 2023-03-14T21:37:41Z

Would be helpful to confirm which Zarr version you had installed.

rsignell-usgs · 2023-03-14T23:06:01Z

Hmm, zarr=2.13.6, the latest from conda-forge. I see that zarr=2.14.2 has been released though. I'll try pip installing that.

rsignell-usgs · 2023-03-15T00:17:30Z

Okay, with the latest zarr=2.14.2, I don't get the read_only errors.

But the workflow fails near the end of the rechunking process:


KilledWorker: Attempted to run task ('copy_intermediate_to_write-bca90f45d4dc080cca14b54ce5a10d1f', 2) on 3 different workers, but all those workers died while running it. The last worker that attempt to run the task was tls://10.10.105.181:35291. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

The logs from those workers are not available on the dashboard, I guess because the workers died, right?

This rechunker workflow was working in December. Should I revert to zarr and rechunker from that era?

rabernat · 2023-03-15T00:20:07Z

Ideally you would figure out what is going wrong and help us fix it, rather than rolling back to an earlier version. After all, you're a rechunker maintainer now! 😉

Are you sure that all your package versions match on your workers?

rsignell-usgs · 2023-03-15T00:20:27Z

I'm certainly willing to try to help debug it, but don't really know where to start. If you have ideas, I'm game to try them.

One of the nice things about nebari/conda-store is the notebook and workers see the same environment (accessed from the conda-store pod), so the versions always match.

I added you to the ESIP Nebari deployment if you are interested in checking it out.

https://nebari.esipfed.org/hub/user-redirect/lab/tree/shared/users/Welcome.ipynb

https://nebari.esipfed.org/hub/user-redirect/lab/tree/shared/users/rsignell/notebooks/NWM/rechunk_grid/03_rechunk.ipynb

rabernat · 2023-03-15T02:20:53Z

I won't be able to log into the ESIP cluster to debug your failing computation. If you think there has been a regression in rechunker in the new release, I strongly encourage you to develop a minimum reproducible example and share it via the issue tracker.

If you have ideas, I'm game to try them.

My first idea would be to freeze every package version except rechunker in your environment, and then try running the exact same workflow with only different rechunker versions (say 0.5.0 vs 0.5.1). Your example has a million moving pieces. Dask, Zarr, kerchunk, xarray, etc etc. It's impossible to say whether your problem is caused by a change in rechunker unless you can isolate this. There have been extremely few changes to rechunker over the past year. Nothing that obviously would cause your dask workers to start running out of memory.

rsignell-usgs · 2023-03-15T18:15:00Z

I've confirmed that my rechunking workflow runs successfully if I pin zarr=2.13.3:

cf_xarray                 0.8.0              pyhd8ed1ab_0    conda-forge
dask                      2023.3.1           pyhd8ed1ab_0    conda-forge
dask-core                 2023.3.1           pyhd8ed1ab_0    conda-forge
dask-gateway              2022.4.0           pyh8af1aa0_0    conda-forge
dask-geopandas            0.3.0              pyhd8ed1ab_0    conda-forge
dask-image                2022.9.0           pyhd8ed1ab_0    conda-forge
fsspec                    2023.3.0+5.gbac7529          pypi_0    pypi
intake-xarray             0.6.1              pyhd8ed1ab_0    conda-forge
jupyter_server_xarray_leaflet 0.2.3              pyhd8ed1ab_0    conda-forge
numcodecs                 0.11.0          py310heca2aa9_1    conda-forge
pint-xarray               0.3                pyhd8ed1ab_0    conda-forge
rechunker                 0.5.1                    pypi_0    pypi
rioxarray                 0.13.4             pyhd8ed1ab_0    conda-forge
s3fs                      2022.11.0       py310h06a4308_0  
xarray                    2023.2.0           pyhd8ed1ab_0    conda-forge
xarray-datatree           0.0.12             pyhd8ed1ab_0    conda-forge
xarray-spatial            0.3.5              pyhd8ed1ab_0    conda-forge
xarray_leaflet            0.2.3              pyhd8ed1ab_0    conda-forge
zarr                      2.13.3             pyhd8ed1ab_0    conda-forge

If I change to zarr=2.13.6 I get the ReadOnlyError: object is read-only error.
If I change to zarr=2.14.2 I get the dask workers dying.

rsignell-usgs · 2023-03-15T18:16:39Z

@gzt5142 has a minimal reproducible example he will post shortly. But should this be raised as a zarr issue?

rabernat · 2023-03-15T18:18:20Z

Thanks a lot for looking into this Rich!

But should this be raised as a zarr issue?

How minimal is it? Can you decouple it from the dask and rechunker issues? Can you say more about what you think the root problem is?

rsignell-usgs · 2023-03-22T16:32:59Z

Unfortunately it turns out the minimal example we created works fine -- does not trigger the problem described here. :(

rabernat · 2023-03-31T15:25:27Z

I'm going to reopen this issue.

If there is a bug somewhere in our stack that is preventing rechunker from working properly, we really need to get to the bottom of it.

rabernat mentioned this issue Feb 20, 2023

fix zarr fsstore regression #136

Merged

rabernat closed this as completed in #136 Feb 20, 2023

rabernat mentioned this issue Feb 21, 2023

regression using fsspec.mapping.FSMap object to initialize a store zarr-developers/zarr-python#1352

Closed

rabernat reopened this Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression with Zarr: ReadOnlyError #135

Regression with Zarr: ReadOnlyError #135

rabernat commented Feb 20, 2023

rsignell-usgs commented Mar 14, 2023

rabernat commented Mar 14, 2023 •

edited

rsignell-usgs commented Mar 14, 2023

rabernat commented Mar 14, 2023

rsignell-usgs commented Mar 14, 2023 •

edited

rsignell-usgs commented Mar 15, 2023 •

edited

rabernat commented Mar 15, 2023 •

edited

rsignell-usgs commented Mar 15, 2023 •

edited

rabernat commented Mar 15, 2023

rsignell-usgs commented Mar 15, 2023

rsignell-usgs commented Mar 15, 2023 •

edited

rabernat commented Mar 15, 2023

rsignell-usgs commented Mar 22, 2023 •

edited

rabernat commented Mar 31, 2023

Regression with Zarr: ReadOnlyError #135

Regression with Zarr: ReadOnlyError #135

Comments

rabernat commented Feb 20, 2023

rsignell-usgs commented Mar 14, 2023

rabernat commented Mar 14, 2023 • edited

rsignell-usgs commented Mar 14, 2023

rabernat commented Mar 14, 2023

rsignell-usgs commented Mar 14, 2023 • edited

rsignell-usgs commented Mar 15, 2023 • edited

rabernat commented Mar 15, 2023 • edited

rsignell-usgs commented Mar 15, 2023 • edited

rabernat commented Mar 15, 2023

rsignell-usgs commented Mar 15, 2023

rsignell-usgs commented Mar 15, 2023 • edited

rabernat commented Mar 15, 2023

rsignell-usgs commented Mar 22, 2023 • edited

rabernat commented Mar 31, 2023

rabernat commented Mar 14, 2023 •

edited

rsignell-usgs commented Mar 14, 2023 •

edited

rsignell-usgs commented Mar 15, 2023 •

edited

rabernat commented Mar 15, 2023 •

edited

rsignell-usgs commented Mar 15, 2023 •

edited

rsignell-usgs commented Mar 15, 2023 •

edited

rsignell-usgs commented Mar 22, 2023 •

edited