Skip to content

JackKelly/light-speed-io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Light Speed IO (LSIO)

The ultimate ambition is to enable folks to efficiently load and process large, multi-dimensional datasets as fast as modern CPUs & I/O subsystems will allow.

For now, this repo is just a place for me to tinker with ideas. This code won't do anything vaguely useful for months!

Under the hood, light-speed-io uses io_uring on Linux for local files, and will use object_store for all other data I/O.

My first use-case for light-speed-io is to help to speed up reading Zarr. After that, I'm interested in helping to create fast readers for "native" geospatial file formats like GRIB2 and EUMETSAT native files. And, even further than that, I'm interested in efficient & fast computation on out-of-core, chunked, labelled, multi-dimensional data.

See planned_design.md for more info.

Roadmap

(This will almost certainly change!)

The list below is in (rough) chronological order. This roadmap is also represnted in the GitHub milestones for this project, when sorted alphabetically.

MVP IO backends

MVP Compute:

  • Build a general-purpose work-steeling framework for applying arbitrary functions to chunks of data in parallel
  • Wrap a few decompression algorithms
  • MVP Zarr library (just for reading data), with Python API
  • Benchmark lsio_zarr vs zarr-python v3

Iterate on the IO backends:

  • Optimise (merge and split) IO operations
  • Implement writing using lsio_uring
  • Implement writing using lsio_object_store_bridge
  • Re-use IO buffers
  • Register buffers with io_uring

Iterate on compute

  • Investigate how to integrate LSIO with xarray, such that chunkwise computation can be "pushed down" to LSIO

Iterate on file format libraries

  • Implement writing in lsio_zarr
  • Implement simple GRIB reader

Project structure

Light Speed IO is organised as a Cargo workspace with multiple (small) crates. The crates are organised in a flat crate structure. The flat crate structure is used by projects such as Ruff, Polars, and rust-analyser.

LSIO crate names use snake_case, following in the footsteps of the Rust Book and Ruff. (The choice of snake_case versus hyphens is, as far as I can tell, entirely arbitrary: Polars and rust-analyser both use hyphens. I just prefer the look of underscores!)

About

Read many chunks of files at high speed

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages