Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading local Zarr files into stars #663

Open
oshuwilson opened this issue Jan 30, 2024 · 13 comments
Open

Reading local Zarr files into stars #663

oshuwilson opened this issue Jan 30, 2024 · 13 comments

Comments

@oshuwilson
Copy link

oshuwilson commented Jan 30, 2024

Hi,

After looking at the vignette for reading Zarr files in stars, I am unsure how to read local Zarr directories into R. I have been trying to work with satellite imagery for the Southern Ocean downloaded from Copernicus' Marine Data Client.

Here is my attempt at coding this

`library(stars)

dsn <- 'ZARR:"sic_daily_samples.zarr/"'

read_mdim(dsn)`

Which gives the error message

Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' In addition: Warning messages: 1: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 2: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 3: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 4: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled

I've uploaded a subset of the data for ease but I can't figure out how to read it as a zipped or unzipped file, so any help with this would be appreciated!

Thanks,
Josh

sic_daily_samples.zarr.zip

@edzer
Copy link
Member

edzer commented Jan 30, 2024

I get

> read_mdim("sic_daily_sample.zarr/")
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
           Min. 1st Qu. Median Mean 3rd Qu. Max.  NA's
siconc [1]   NA      NA     NA  NaN      NA   NA 1e+05
dimension(s):
          from   to  refsys point
longitude    1 4320  WGS 84    NA
latitude     1  961  WGS 84    NA
time         1    1 POSIXct  TRUE
                                                      values x/y
longitude       [-180.0417,-179.9583),...,[179.875,179.9583) [x]
latitude  [-80.04167,-79.95833),...,[-0.04166667,0.04166667) [y]
time                                          2021-01-09 UTC    

What is your sessionInfo() and sf_extSoftVersion() output, after loading stars?

@oshuwilson
Copy link
Author

Thanks Edzer, I tried the same code and got the same error message.

My sessionInfo() gives

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.6-4 sf_1.0-14   abind_1.4-5

loaded via a namespace (and not attached):
 [1] utf8_1.2.4         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-22 parallel_4.3.2     pkgconfig_2.0.3   
[11] generics_0.1.3     dplyr_1.1.3        lifecycle_1.0.4    classInt_0.4-10    cli_3.6.1         
[16] fansi_1.0.5        vctrs_0.6.4        grid_4.3.2         DBI_1.2.1          proxy_0.4-27      
[21] class_7.3-22       compiler_4.3.2     rstudioapi_0.15.0  tools_4.3.2        pillar_1.9.0      
[26] Rcpp_1.0.11        rlang_1.1.2        units_0.8-4       

And my sf_extSoftVersion() prints

   GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.7.2"        "9.3.0"         "true"         "true"        "9.3.0" 

@edzer
Copy link
Member

edzer commented Jan 30, 2024

Please update sf to 1.0-15, and try again.

@oshuwilson
Copy link
Author

That still printed the same error message as previously. I haven't yet downloaded the latest version of RStudio but I don't imagine that would cause this error?

@edzer
Copy link
Member

edzer commented Jan 30, 2024

See also #566 (comment)

@oshuwilson
Copy link
Author

Apologies, I'm not yet proficient with R. How do I install that patch? I tried using remotes::install_github("rspatial/sf") but I'm still seeing the same error code.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

No need for you to install that patch.

@oshuwilson
Copy link
Author

Sorry I'm a bit lost as to what steps I can take from the other issue to fix my issue.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

I'm just cross linking them; I can reproduce the error on GitHub actions here: https://github.com/r-spatial/stars/actions/runs/7712573313/job/21020420577#step:6:297

@pepijn-devries
Copy link

@oshuwilson,

It seems that this issue is specific to the Windows binary release. Note that you can use CopernicusMarine for subsetting Copernicus Marine data as well. However, it does not yet support ZARR data because of the issue reported here and #566 (comment)

@oshuwilson
Copy link
Author

Thanks @pepijn-devries - I'll look at doing that to download as a netCDF if the Zarr format remains unusable for my setup. My main issue is that the full data I need is massive (~1.3TB as a netCDF but only ~250GB as Zarr), so Zarr would be preferable if it can work! But if not, I'll get a new hard drive and put my computer to the test.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

It seems that this issue is specific to the Windows binary release.

Windows and MacOS binary releases; we added blosc, at least to windows binary builds, but this suggests it's not working.

@pepijn-devries
Copy link

pepijn-devries commented Mar 11, 2024

Hi @edzer,

Is there any news on the Windows build and blosc decompression of ZARR files? Thanks for your work on the package!

By the way, I did some additional testing. The issue does not only occur on Windows, but also on a Linux Fedora (virtual) machine I have set up:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.12.1, GDAL 3.7.3, PROJ 9.2.1; sf_use_s2() is TRUE
dsn <- 'ZARR:"/vsicurl/https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr"'
bounds <- c(longitude = "lon_bounds", latitude = "lat_bounds")
r <- read_mdim(dsn, bounds = bounds)
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2024-03-11 with reprex v2.1.0

With sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 39 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=nl_NL.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=nl_NL.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=nl_NL.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] gtable_0.3.4       dplyr_1.1.4        compiler_4.3.2     tidyselect_1.2.0   reprex_2.1.0       Rcpp_1.0.12       
 [7] clipr_0.8.0        callr_3.7.5        scales_1.3.0       yaml_2.3.8         fastmap_1.1.1      ggplot2_3.5.0     
[13] R6_2.5.1           generics_0.1.3     classInt_0.4-10    sf_1.0-15          knitr_1.45         tibble_3.2.1      
[19] units_0.8-5        munsell_0.5.0      DBI_1.2.2          pillar_1.9.0       rlang_1.1.3        utf8_1.2.4        
[25] xfun_0.42          fs_1.6.3           cli_3.6.2          withr_3.0.0        magrittr_2.0.3     ps_1.7.6          
[31] class_7.3-22       processx_3.8.3     digest_0.6.34      grid_4.3.2         rstudioapi_0.15.0  lifecycle_1.0.4   
[37] vctrs_0.6.5        KernSmooth_2.23-22 proxy_0.4-27       evaluate_0.23      glue_1.7.0         fansi_1.0.6       
[43] e1071_1.7-14       colorspace_2.1-0   rmarkdown_2.26     tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants