Skip to content

seung-lab/chunkflow

Repository files navigation

alt text

GitHub workflow Documentation Status PyPI version License Docker Hub Twitter URL

Problem

  • Petabyte scale 3D image processing is slow and computationally demanding;
  • Computation has to be distributed with linear scalability;
  • Local cluster and public cloud computing are not fully used at the same time;
  • Duplicated code across a variety of routine tasks is hard to maintain.

Features

  • Composable operators. The chunk operators could be composed in a command line for flexible usage.
  • Hybrid Cloud Distributed computation in both local and cloud computers. The task scheduling frontend and computationally heavy backend are decoupled using AWS Simple Queue Service. The backend could be any computer with an internet connection and cloud authentication. Benefit from the robust design, the cheap unstable instances (preemptable intance in Google Cloud, spot instance in AWS) could be used to reduce cost by about threefold!
  • Petabyte scale. We have used chunkflow to output over eighteen-petabyte images and scaled up to 3600 nodes with NVIDIA GPUs across three regions in Google Cloud, and chunkflow is still reliable.
  • Operators work with 3D image volumes.
  • You can plug in your own code as an operator.

Check out the Documentation for installation and usage. Try it out by following the tutorial.

Image Segmentation Example

Perform Convolutional net inference to segment 3D image volume with one single command!

#!/bin/bash

chunkflow \
    load-tif --file-name path/of/image.tif -o image \
    inference --convnet-model path/of/model.py --convnet-weight-path path/of/weight.pt \
        --input-patch-size 20 256 256 --output-patch-overlap 4 64 64 --num-output-channels 3 \
        -f pytorch --batch-size 12 --mask-output-chunk -i image -o affs \
    plugin -f agglomerate --threshold 0.7 --aff-threshold-low 0.001 --aff-threshold-high 0.9999 -i affs -o seg \
    neuroglancer -i image,affs,seg -p 33333 -v 30 6 6

you can see your 3D image and segmentation directly in Neuroglancer!

Image_Segmentation

Composable Operators

After installation, You can simply type chunkflow and it will list all the operators with help message. We keep adding new operators and will keep it update here. For the detailed usage, please checkout our Documentation.

Operator Name Function
adjust-bbox adjust the corner offset of bounding box
channel-voting Vote across channels of semantic map
cleanup remove empty files to clean up storage
connected-components Threshold the boundary map to get a segmentation
copy-var Copy a variable to a new name
create-chunk Create a fake chunk for easy test
create-info Create info file of Neuroglancer Precomputed volume
crop-margin Crop the margin of a chunk
debug Add breakpoint to debug the task content
delete-chunk Delete chunk in task to reduce RAM requirement
delete-task-in-queue Delete the task in AWS SQS queue
downsample-upload Downsample the chunk hierarchically and upload to volume
download-mesh Download meshes from Neuroglancer Precomputed volume
evaluate-segmentation Compare segmentation chunks
fetch-task-from-file Fetch task from a file
fetch-task-from-sqs Fetch task from AWS SQS queue one by one
generate-tasks Generate tasks one by one
gaussian-filter 2D Gaussian blurring operated in-place
inference Convolutional net inference
log-summary Summary of logs
mark-complete mark task completion as an empty file
mask Black out the chunk based on another mask chunk
mask-out-objects Mask out selected or small objects
multiply Multiply chunks with another chunk
mesh Build 3D meshes from segmentation chunk
mesh-manifest Collect mesh fragments for object
neuroglancer Visualize chunks using neuroglancer
normalize-contrast-nkem Normalize image contrast using histograms
normalize-intensity Normalize image intensity to -1:1
normalize-section-shang Normalization algorithm created by Shang
plugin Import local code as a customized operator.
quantize Quantize the affinity map
load-h5 Read HDF5 files
load-npy Read NPY files
load-json Read JSON files
load-pngs Read png files
load-precomputed Cutout chunk from a local/cloud storage volume
load-tif Read TIFF files
load-skeleton Load skeletons
load-synapses Load synapses from a file
load-zarr Read Zarr files
setup-env Prepare storage infor files and produce tasks
skip-task-by-file If a result/flag file already exists, skip this task
skip-task-by-blocks-in-volume If all the blocks already exists in volume, skip this task
skip-all-zero If a chunk has all zero, skip this task
skip-none If an item in task is None, skip this task
threshold Use a threshold to segment the probability map
view Another chunk viewer in browser using CloudVolume
save-h5 Save chunk as HDF5 file
save-points Save point cloud as a HDF5 file.
save-pngs Save chunk as a serials of png files
save-precomputed Save chunk to local/cloud storage volume
save-tif Save chunk as TIFF file
save-synapses Save synapses as a HDF5 file.
save-swc Save skeletons as a SWC file.

Affiliation

This package is developed at Princeton University and Flatiron Institute.

Reference

We have a paper for this repo:

@article{wu_chunkflow_2021,
	title = {Chunkflow: hybrid cloud processing of large {3D} images by convolutional nets},
	issn = {1548-7105},
	shorttitle = {Chunkflow},
	url = {https://www.nature.com/articles/s41592-021-01088-5},
	doi = {10.1038/s41592-021-01088-5},
	journal = {Nature Methods},
	author = {Wu, Jingpeng and Silversmith, William M. and Lee, Kisuk and Seung, H. Sebastian},
	year = {2021},
	pages = {1--2}
}