Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

CCI-Tools/zarr-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zarr-cache

An experimental cache for multiple Zarr datasets.

Idea

Suppose we read Zarr array chunks from some slow data API. The API access is implemented as store of type collections.abc.MutableMapping. And assumed we have much faster storage, for example some high performance S3-compatible object storage. Then it would make sense to use the object storage as a cache for the array chunks retrieved by the slow API store.

However, caching requires efficient management of the possibly very large number of keys in the cache. Enumerating and sorting of keys, pushing and popping of keys should be very fast operations. We'd also like to associate cache-specific values which each key e.g. access frequency, access duration, chunk sizes, etc. The object storage may not be an ideal candidate provide that capabilities very well.

The design used in this library therefore splits the cache storage into

  1. an index of cached keys, e.g. a Redis database;
  2. a storage for the cached array chunks, e.g. S3.

Usage

Programming model:

    from collections.abc import MutableMapping

    import xarray

    from zarr_cache import CachedStore
    from zarr_cache import S3StoreOpener
    from zarr_cache import MemoryStoreIndex
    from zarr_cache import IndexedCacheStorage

    def open_my_slow_store(store_id: str, ...) -> MutableMapping:
        return ...

    def wrap_store(original_store: MutableMapping, store_id: str) -> MutableMapping:
        # Coming soon:
        # store_index = RedisStoreIndex(...)
        store_index = MemoryStoreIndex()
        store_opener = S3StoreOpener('s3://my_bucket/{store_id}.zarr', ...)
        cache_storage = IndexedCacheStorage(store_index, store_opener)
        return CachedStore(original_store, store_id, cache_storage)
    
    my_store_id = "..."
    my_slow_store = open_my_slow_store(my_store_id, ...) 
    my_faster_store = wrap_store(my_slow_store, my_store_id)
    
    dataset = xarray.open_zarr(my_faster_store)

Installation

Get code

$ git clone https://github.com/CCI-Tools/zarr-cache.git
$ cd zarr-cache

Install Python environment

If you already have an existing environment that is supposed to use zarr-cache:

$ conda env update

If you don't have an environment yet:

$ conda env create
$ conda activate zarr-cache

Install package

$ python setup.py install 

About

An experimental cache for Zarr chunks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published