Simple slicing for chunked data #1617

pgrete · 2024-04-29T12:54:44Z

I just learned the long way that my naive approach to slice data (or even read full datasets) like

mydata = it.meshes['cons_cons_density_lvl1'][opmd.Record_Component.SCALAR][:,:,10]
series.flush()
do_work_with_mydata()

Does not play nicely with chunked data (ADIOS2/mp5 output).

I saw the the openpmd-viewer introduces quite a bit of logic to load slice data https://github.com/openPMD/openPMD-viewer/blob/6eccb608893d2c9b8d158d950c3f0451898a80f6/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py#L88

Is there a simpler approach?
I naively would have expected that there's sth like load_chunk*s* (potentially hidden inside the load_chunk call) that would load the data from all chunks that are part of the dataset.

A related question to the approach above also pertains to performance: The viewer example load each chunk individually by calling many flushes. Is there a better(transparent way) around it (again ideally hidden behind a load_chunk*s* call)?

The text was updated successfully, but these errors were encountered:

franzpoeschel · 2024-05-06T09:24:36Z

Hello Philipp,
the code that you found seems a bit more involved than is necessary. Do I understand correctly that what you want to do is to inspect the n-dimensional blocks as they are in the bp5 files and load the slices that are there instead of selecting slices yourself?
In that case, I suggest doing:

record_component = it.meshes['cons_cons_density_lvl1'][opmd.Record_Component.SCALAR]
available_chunks = record_component.available_chunks()
loaded_chunks = []
for chunk_ in available_chunks_:
    loaded_chunks_.append(record_component.load_chunk(chunk_.offset, chunk_.extent))
series.flush()
print("Loaded:\n")
for chunk_ in loaded_chunks_:
    print("\t{}".format(chunk_))

A load_chunks() call could theoretically be introduced, but would purely be syntactical sugar as the single load_chunk() operations are all executed at once during series.flush().

Otherwise, if you want to actually load everything, slicing should not be necessary, but instead total_chunk = record_component.load_chunk() should be efficient enough in ADIOS2, but I'm not sure that's what you are looking for.

Are you running in parallel? I have a WIP branch with chunk distribution algorithms for parallel setups. If you're interested in that, we can also try sth there.

pgrete added the question label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple slicing for chunked data #1617

Simple slicing for chunked data #1617

pgrete commented Apr 29, 2024

franzpoeschel commented May 6, 2024

Simple slicing for chunked data #1617

Simple slicing for chunked data #1617

Comments

pgrete commented Apr 29, 2024

franzpoeschel commented May 6, 2024