Use sans-io principles to decouple IO from business logic #954

agoose77 · 2023-09-13T12:09:33Z

Note
This is a refactor request / place to put my thoughts after working on emscripten support for uproot.
I don't necessarily think we have the human-power to do this.

One of the challenges associated with integrating uproot into e.g. pyscript is that it blocks the main thread for IO, as pyjs et al. can't create background threads using the built-in threading models. There are things like webworkers which can offload work into a separate context, but these need helpers (trampolines), and work asynchronously via message passing. Blocking IO can be performed with fewer consequences in a jupyterlite kernel context (it's already in a separate webworker!).

Another restriction posed by our existing IO handling is that integration into other async environments is nontrivial. To use asyncio requires uproot to be invoked behind run_in_executor contexts.

It would be nice if we were able to provide e.g. an additional async implementation of uproot that could safely wait for IO to complete, e.g.

from uproot import async as up

async with up.open("../some/file.root") as f:
    data = await f['myTree'].arrays()

Of course, providing an async API might sound like a lot of duplication. The trick to doing this is to employ sans-IO principles that separate the business logic from the IO. This means that it's trivial to write blocking and non-blocking variants of an API.

For non request-response APIs, such as uproot's parsing routines (that have a lot of branching logic based upon buffer contents), the common model of

# Build query
request = sansio_api.build_request()

# Submit query, receive response
response = session.request(request)

# Process response
result = sansio_api.handle_response(response)

doesn't really apply. Instead one can implement a state machine that produces and consumes events such that there's a separate, simple event loop executor e.g. https://gist.github.com/agoose77/0e9b12a1c04afe61bce1c1c96ec5e3a1

Given that event loops are ultimately what powers asyncio et al., it might seem that what we're really saying here is "write async and call it synchronously". However, it is important to keep a distinction between an async implementation and the concept of an event loop; there are multiple async frameworks, and the compatibility story is not perfect. Writing a state machine and associated event loop can make it trivial to port to new event-loops and frameworks.

The text was updated successfully, but these errors were encountered:

agoose77 added the feature New feature or request label Sep 13, 2023

jpivarski added the big-project This will take some time, perhaps as a Fellowship or GSoC project label Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sans-io principles to decouple IO from business logic #954

Use sans-io principles to decouple IO from business logic #954

agoose77 commented Sep 13, 2023

Use sans-io principles to decouple IO from business logic #954

Use sans-io principles to decouple IO from business logic #954

Comments

agoose77 commented Sep 13, 2023