Skip to content

catalystneuro/HDF5Zarr

Repository files navigation

Reading HDF5 files with Zarr building upon Cloud-Performant NetCDF4/HDF5 Reading with the Zarr Library. This allows for efficiently reading HDF5 files stored remotely, and integration with Zarr-based computation tools.

codecov

Installation:

Requires latest dev installation of h5py, with HDF5>=1.10.5.

Install HDF5

Check available HDF5 version:

$ h5cc -showconfig
Conda:
$ conda install "hdf5>=1.10.5"
Source installation:

Download and install HDF5
e.g.

$ cd hdf5*/bin
$ ./h5redeploy

Install h5py

Follow h5py instructions for custom installation
For example:

Conda:
$ HDF5_DIR=$CONDA_PREFIX pip install --no-binary=h5py h5py
Source installation:
$ HDF5_DIR=/path/to/hdf5 pip install --no-binary=h5py h5py

Install HDF5Zarr

$ pip install git+https://github.com/catalystneuro/HDF5Zarr.git

Usage:

Reading local data

HDF5Zarr can be used to read a local HDF5 file where the datasets are actually read using the Zarr library. Download example dataset from https://girder.dandiarchive.org/api/v1/item/5eda859399f25d97bd27985d/download:

import requests
import os.path as op
file_name = 'sub-699733573_ses-715093703.nwb'

if not op.exists(file_name):
    response = requests.get("https://girder.dandiarchive.org/api/v1/item/5eda859399f25d97bd27985d/download")
    with open(file_name, mode='wb') as localfile:
        localfile.write(response.content)
import zarr
from hdf5zarr import HDF5Zarr

file_name = 'sub-699733573_ses-715093703.nwb'
hdf5_zarr = HDF5Zarr(filename = file_name, store_mode='w', max_chunksize=2*2**20)
zgroup = hdf5_zarr.consolidate_metadata(metadata_key = '.zmetadata')

Without indicating a specific zarr store, zarr uses the default zarr.MemoryStore. Alternatively, pass a zarr store such as:

store = zarr.DirectoryStore('storezarr')
hdf5_zarr = HDF5Zarr(file_name, store = store, store_mode = 'w')

Examine structure of file using Zarr tools:

# print dataset names
zgroup.tree()
# read
arr = zgroup['units/spike_times']
val = arr[0:1000]

Once you have a zgroup object, this object can be read by PyNWB using

from hdf5zarr import NWBZARRHDF5IO
io = NWBZARRHDF5IO(mode='r+', file=zgroup)

Export metadata from zarr store to a single json file

import json
metadata_file = 'metadata'
with open(metadata_file, 'w') as mfile:
    json.dump(zgroup.store.meta_store, mfile)

Open NWB file on remote S3 store.

Requires a local metadata_file, constructed in previous steps.

import s3fs
from hdf5zarr import NWBZARRHDF5IO


# import metadata from a json file
with open(metadata_file, 'r') as mfile:
    store = json.load(mfile)

fs = s3fs.S3FileSystem(anon=True)

f = fs.open('dandiarchive/girder-assetstore/4f/5a/4f5a24f7608041e495c85329dba318b7', 'rb')

hdf5_zarr = HDF5Zarr(f, store = store, store_mode = 'r')
zgroup = hdf5_zarr.zgroup
io = NWBZARRHDF5IO(mode='r', file=zgroup, load_namespaces=True)

Here is the entire workflow for opening a file remotely:

import zarr
import s3fs
from hdf5zarr import HDF5Zarr, NWBZARRHDF5IO

file_name = 'sub-699733573_ses-715093703.nwb'
store = zarr.DirectoryStore('storezarr')
hdf5_zarr = HDF5Zarr(filename = file_name, store=store, store_mode='w', max_chunksize=2*2**20)
zgroup = hdf5_zarr.consolidate_metadata(metadata_key = '.zmetadata')


fs = s3fs.S3FileSystem(anon=True)

f = fs.open('dandiarchive/girder-assetstore/4f/5a/4f5a24f7608041e495c85329dba318b7', 'rb')
hdf5_zarr = HDF5Zarr(f, store = store, store_mode = 'r')
zgroup = hdf5_zarr.zgroup
io = NWBZARRHDF5IO(mode='r', file=zgroup, load_namespaces=True)
nwb = io.read()

Releases

No releases published

Packages

No packages published

Languages