You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My current plan is to break this huge task into four large stages:
Stage 1 (Before July 2024)
In Stage 1 we aggressively use dummy data (raw bytes copied from real root files) and try to figure out what are the big-stroke steps we need when writing RNTuple to disk. Roughly they should be:
TFile header, RNTuple Anchor, RNTuple header
Update RNTuple anchor location
Writing out pages
Then RNTuple footer, TDictionary?
Update locations in TFile header and TDictionary (RNTuple Anchor)
Usability after Stage 1:
bare minimal, user can write out primitive fields that are not too long.
Stage 2 (Before Oct 2024)
In Stage 2 we try to peel off dummy stuff in two ways:
Reduce amount of "hard code dummy and modify" code smell
allow larger files by dealing with cluster group related things correctly
At this stage we will still be very rigid when it comes to schema complexity, but closer to production quality in the "file size" aspect.
Usability after Stage 2:
Basically functional for very simple data schema up to any size, as long as it fits in RAM (one-shot writing)
Stage 3 (Before June 2025)
In Stage 3 we will expand the level of completeness in two critical ways:
Improve schema support, in particular offset vector fields, consider possibility of switching to AwkwardArray.jl at this stage
Allow appending / streaming of data onto disk
Usability after Stage 3:
This will be analysis production ready -- users can write nanoAOD-kind of files with any size and can append to existing dataset, this will be the medium milestone for a production-level useful RNTuple writer
Stage 4 (unknown time)
In Stage 4 we try to complete whatever is still missing, possible items:
vastly improve schema supports, we need a good design for this to be maintainable in the long term
introduce other features of RNTuple and provide APIs, such as alias column
The text was updated successfully, but these errors were encountered:
Moelf
changed the title
[RNTuple] Roadmap to writing RNTuple to disk
[RNTuple] Roadmap of writing RNTuple to disk
May 4, 2024
My current plan is to break this huge task into four large stages:
Stage 1 (Before July 2024)
In Stage 1 we aggressively use dummy data (raw bytes copied from real root files) and try to figure out what are the big-stroke steps we need when writing RNTuple to disk. Roughly they should be:
Usability after Stage 1:
bare minimal, user can write out primitive fields that are not too long.
Stage 2 (Before Oct 2024)
In Stage 2 we try to peel off dummy stuff in two ways:
At this stage we will still be very rigid when it comes to schema complexity, but closer to production quality in the "file size" aspect.
Usability after Stage 2:
Basically functional for very simple data schema up to any size, as long as it fits in RAM (one-shot writing)
Stage 3 (Before June 2025)
In Stage 3 we will expand the level of completeness in two critical ways:
Usability after Stage 3:
This will be analysis production ready -- users can write nanoAOD-kind of files with any size and can append to existing dataset, this will be the medium milestone for a production-level useful RNTuple writer
Stage 4 (unknown time)
In Stage 4 we try to complete whatever is still missing, possible items:
The text was updated successfully, but these errors were encountered: