The ultimate ambition is to enable folks to efficiently load and process large, multi-dimensional datasets as fast as modern CPUs & I/O subsystems will allow.
For now, this repo is just a place for me to tinker with ideas. This code won't do anything vaguely useful for months!
Under the hood, light-speed-io
uses io_uring
on Linux for local files, and will use object_store
for all other data I/O.
My first use-case for light-speed-io is to help to speed up reading Zarr. After that, I'm interested in helping to create fast readers for "native" geospatial file formats like GRIB2 and EUMETSAT native files. And, even further than that, I'm interested in efficient & fast computation on out-of-core, chunked, labelled, multi-dimensional data.
See planned_design.md
for more info.
(This will almost certainly change!)
The list below is in (rough) chronological order. This roadmap is also represnted in the GitHub milestones for this project, when sorted alphabetically.
- Implement minimal
lsio_uring
IO backend (for loading data from a local SSD) - Benchmark
lsio_uring
backend - Implement minimal
object_store_bridge
IO backend - Compare benchmarks for
lsio_uring
vsobject_store_bridge
- Improve usability and robustness
- Group operations
- Build a general-purpose work-steeling framework for applying arbitrary functions to chunks of data in parallel
- Wrap a few decompression algorithms
- MVP Zarr library (just for reading data), with Python API
- Benchmark
lsio_zarr
vszarr-python v3
- Optimise (merge and split) IO operations
- Implement writing using
lsio_uring
- Implement writing using
lsio_object_store_bridge
- Re-use IO buffers
- Register buffers with
io_uring
- Investigate how to integrate LSIO with xarray, such that chunkwise computation can be "pushed down" to LSIO
- Implement writing in
lsio_zarr
- Implement simple GRIB reader
Light Speed IO is organised as a Cargo workspace with multiple (small) crates. The crates are organised in a flat crate structure. The flat crate structure is used by projects such as Ruff, Polars, and rust-analyser.
LSIO crate names use snake_case, following in the footsteps of the Rust Book and Ruff. (The choice of snake_case versus hyphens is, as far as I can tell, entirely arbitrary: Polars and rust-analyser both use hyphens. I just prefer the look of underscores!)