Skip to content

xtensor-stack/xtensor-zarr

Repository files navigation

xtensor-zarr

Azure Pipelines ReadTheDocs Binder Join the Gitter Chat

Implementation of the Zarr core protocol (version 2 and 3) based on xtensor

Introduction

xtensor-zarr offers an API to create and access a Zarr (v2 or v3) hierarchy in a store (locally or in the cloud), read and write arrays (in various formats) and groups in the hierarchy, and explore the hierarchy.

Installation

xtensor-zarr comes with a libxtensor-zarr-gdal shared library for accessing stores through GDAL's Virtual File System or the local file system. For all other stores, xtensor-zarr is a header-only library.

We provide a package for the mamba (or conda) package manager:

mamba install xtensor-zarr -c conda-forge
  • xtensor-zarr depends on xtensor ^0.23.8, xtensor-io ^0.12.7, zarray ^0.0.7, nlohmann_json ^3.9.1, Blosc ^1.21.0, zlib ^1.2.11, gdal ^3.0.0 and cpp-filesystem ^1.3.0.

  • google-cloud-cpp and aws-sdk-cpp are optional dependencies to xtensor-zarr.

    • google-cloud-cpp is required to access a store in Google Cloud Storage.
    • aws-sdk-cpp is required to access a store in AWS S3.

You can also install xtensor-zarr from source:

mkdir build
cd build
cmake ..
make
sudo make install

Trying it online

To try out xtensor-zarr interactively in your web browser, just click on the binder link:

Binder

Documentation

To get started with using xtensor-zarr, check out the full documentation:

http://xtensor-zarr.readthedocs.io/

Usage

// create a hierarchy on the local file system
xt::xzarr_file_system_store store("test.zr3");
auto h = xt::create_zarr_hierarchy(store);

// create an array in the hierarchy
nlohmann::json attrs = {{"question", "life"}, {"answer", 42}};
using S = std::vector<std::size_t>;
S shape = {4, 4};
S chunk_shape = {2, 2};
xt::zarray a = h.create_array(
    "/arthur/dent",  // path to the array in the store
    shape,  // array shape
    chunk_shape,  // chunk shape
    "<f8",  // data type, here little-endian 64-bit floating point
    'C',  // memory layout
    '/',  // chunk identifier separator
    xt::xio_binary_config(),  // compressor (here, raw binary)
    attrs,  // attributes
    10,  // chunk pool size
    0.  // fill value
);

// write array data
a(2, 1) = 3.;

// read array data
std::cout << a(2, 1) << std::endl;
// prints `3.`
std::cout << a(2, 2) << std::endl;
// prints `0.` (fill value)
std::cout << a.attrs() << std::endl;
// prints `{"answer":42,"question":"life"}`

// create a group
auto g = h.create_group("/tricia/mcmillan", attrs);

// explore the hierarchy
// get children at a point in the hierarchy
std::string children = h.get_children("/").dump();
std::cout << children << std::endl;
// prints `{"arthur":"implicit_group","marvin":"explicit_group","tricia":"implicit_group"}`
// view the whole hierarchy
std::string nodes = h.get_nodes().dump();
std::cout << nodes << std::endl;
// prints `{"arthur":"implicit_group","arthur/dent":"array","tricia":"implicit_group","tricia/mcmillan":"explicit_group"}`

// use cloud storage
// create an anonymous Google Cloud Storage client
gcs::Client client((gcs::ClientOptions(gcs::oauth2::CreateAnonymousCredentials())));
xzarr_gcs_store s1("zarr-demo/v3/test.zr3", client);
// list keys under prefix
auto keys1 = s1.list_prefix("data/root/arthur/dent/");
for (const auto& key: keys1)
{
    std::cout << key << std::endl;
}
// prints:
// data/root/arthur/dent/c0/0
// data/root/arthur/dent/c0/1
// data/root/arthur/dent/c1/0
// data/root/arthur/dent/c1/1
// data/root/arthur/dent/c2/0
// data/root/arthur/dent/c2/1

xzarr_gcs_store s2("zarr-demo/v3/test.zr3/meta/root/marvin", client);
// list all keys
auto keys2 = s2.list();
for (const auto& key: keys2)
{
    std::cout << key << std::endl;
}
// prints:
// android.array.json
// paranoid.group.json

License

We use a shared copyright model that enables all contributors to maintain the copyright on their contributions.

This software is licensed under the BSD-3-Clause license. See the LICENSE file for details.