Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple slicing for chunked data #1617

Open
pgrete opened this issue Apr 29, 2024 · 1 comment
Open

Simple slicing for chunked data #1617

pgrete opened this issue Apr 29, 2024 · 1 comment
Labels

Comments

@pgrete
Copy link

pgrete commented Apr 29, 2024

I just learned the long way that my naive approach to slice data (or even read full datasets) like

mydata = it.meshes['cons_cons_density_lvl1'][opmd.Record_Component.SCALAR][:,:,10]
series.flush()
do_work_with_mydata()

Does not play nicely with chunked data (ADIOS2/mp5 output).

I saw the the openpmd-viewer introduces quite a bit of logic to load slice data https://github.com/openPMD/openPMD-viewer/blob/6eccb608893d2c9b8d158d950c3f0451898a80f6/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py#L88

Is there a simpler approach?
I naively would have expected that there's sth like load_chunk*s* (potentially hidden inside the load_chunk call) that would load the data from all chunks that are part of the dataset.

A related question to the approach above also pertains to performance: The viewer example load each chunk individually by calling many flushes. Is there a better(transparent way) around it (again ideally hidden behind a load_chunk*s* call)?

@franzpoeschel
Copy link
Contributor

Hello Philipp,
the code that you found seems a bit more involved than is necessary. Do I understand correctly that what you want to do is to inspect the n-dimensional blocks as they are in the bp5 files and load the slices that are there instead of selecting slices yourself?
In that case, I suggest doing:

record_component = it.meshes['cons_cons_density_lvl1'][opmd.Record_Component.SCALAR]
available_chunks = record_component.available_chunks()
loaded_chunks = []
for chunk_ in available_chunks_:
    loaded_chunks_.append(record_component.load_chunk(chunk_.offset, chunk_.extent))
series.flush()
print("Loaded:\n")
for chunk_ in loaded_chunks_:
    print("\t{}".format(chunk_))

A load_chunks() call could theoretically be introduced, but would purely be syntactical sugar as the single load_chunk() operations are all executed at once during series.flush().

Otherwise, if you want to actually load everything, slicing should not be necessary, but instead total_chunk = record_component.load_chunk() should be efficient enough in ADIOS2, but I'm not sure that's what you are looking for.

Are you running in parallel? I have a WIP branch with chunk distribution algorithms for parallel setups. If you're interested in that, we can also try sth there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants