Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sans-io principles to decouple IO from business logic #954

Open
agoose77 opened this issue Sep 13, 2023 · 0 comments
Open

Use sans-io principles to decouple IO from business logic #954

agoose77 opened this issue Sep 13, 2023 · 0 comments
Labels
big-project This will take some time, perhaps as a Fellowship or GSoC project feature New feature or request

Comments

@agoose77
Copy link
Collaborator

Note
This is a refactor request / place to put my thoughts after working on emscripten support for uproot.
I don't necessarily think we have the human-power to do this.

One of the challenges associated with integrating uproot into e.g. pyscript is that it blocks the main thread for IO, as pyjs et al. can't create background threads using the built-in threading models. There are things like webworkers which can offload work into a separate context, but these need helpers (trampolines), and work asynchronously via message passing. Blocking IO can be performed with fewer consequences in a jupyterlite kernel context (it's already in a separate webworker!).

Another restriction posed by our existing IO handling is that integration into other async environments is nontrivial. To use asyncio requires uproot to be invoked behind run_in_executor contexts.

It would be nice if we were able to provide e.g. an additional async implementation of uproot that could safely wait for IO to complete, e.g.

from uproot import async as up

async with up.open("../some/file.root") as f:
    data = await f['myTree'].arrays()

Of course, providing an async API might sound like a lot of duplication. The trick to doing this is to employ sans-IO principles that separate the business logic from the IO. This means that it's trivial to write blocking and non-blocking variants of an API.

For non request-response APIs, such as uproot's parsing routines (that have a lot of branching logic based upon buffer contents), the common model of

# Build query
request = sansio_api.build_request()

# Submit query, receive response
response = session.request(request)

# Process response
result = sansio_api.handle_response(response)

doesn't really apply. Instead one can implement a state machine that produces and consumes events such that there's a separate, simple event loop executor e.g. https://gist.github.com/agoose77/0e9b12a1c04afe61bce1c1c96ec5e3a1

Given that event loops are ultimately what powers asyncio et al., it might seem that what we're really saying here is "write async and call it synchronously". However, it is important to keep a distinction between an async implementation and the concept of an event loop; there are multiple async frameworks, and the compatibility story is not perfect. Writing a state machine and associated event loop can make it trivial to port to new event-loops and frameworks.

@agoose77 agoose77 added the feature New feature or request label Sep 13, 2023
@jpivarski jpivarski added the big-project This will take some time, perhaps as a Fellowship or GSoC project label Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big-project This will take some time, perhaps as a Fellowship or GSoC project feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants