imageset: should do nothing but get the raw data #1023

graeme-winter · 2019-11-19T16:06:47Z

graeme-winter
Nov 19, 2019
Maintainer

At the moment the image set and experiments both keep track of the same models, or multiple copies of the same model, which is at best a source of complete confusion.

I propose we refactor things such that the imageset does exactly one thing: give pixel data. The experiment models handle everything else.

I welcome other viewpoints.

This is currently holding back development.

ndevenish · 2019-11-22T22:20:19Z

ndevenish
Nov 22, 2019
Maintainer

As I see it I think the intention and reality has probably become confused as the models evolved over time.

My interpretation is that ImageSet (ImageSetData but for the purposes of this discussion they are now in a 1:1 relationship) does have access to/store a set of models, but these are supposed to be the models from the file, and otherwise everything should be accessed via the experiment. The storing of the models on ImageData looks like it's supposed to be caching to avoid problems with repeat access.

What actually happens, is that on ExperimentList load, it sets the models on the imageset from each experiment as a way to avoid loading them from file. Then, when accessing imageset models it uses those models instead of re-reading the file metadata to try to recreate the model information.

The fact that it tries to re-read the file for empty models is a bug that is the root cause of a lot of the stills performance issues, that was hacked around with the Lazy* mirror branches. As far as I can think of, there should be zero reason for the metadata models to be read from the image files after import - because a) the information is already in the Experiment and b) the user could have changed/remapped/corrected them anyway - after import, we always have a superior source of truth.

Clearly the Experiment object should be the sole source of model truth except for when building the initial model.

As for the right design - I think you are right - the ImageSet should care only about processing pixels - masks, gains and images. We do, however need an interface to the file - one that maps to multiple Collection[2D] | 3D sets of data e.g. imagine a nexus file contains two separate sweeps of data. Each needs a separate goniometer model. Logically, this is two separate imagesets, each of which accesses the initial metadata for that section of the file separately. How do you access this? Either you need to have fileinstance.get_goniometer(imageset_index, image_index) (bad, I think) or you need some sort of separate object fileinstance.get_logical_data_block(imageset_index).get_goniometer(image_index). This looks a bit like the current imageset does, albeit without the mixing of roles that ImageSet has at the moment.

There's no reason, even with weird XFEL data, to read the metadata from a file after import, right? I think stills_process does, but IIRC that's a workaround to effectively do a two-stage import with the second, more expensive stage on worker nodes (and after which the metadata is no longer accessed from the object stream).

0 replies

phyy-nx · 2019-11-22T23:03:34Z

phyy-nx
Nov 22, 2019
Maintainer

FWIW, the intent of stills process is to only read the metadata once. It takes advantage of LazyImageSet to avoid reading the models during import. It only reads them once experiments have been distributed to worker nodes. This part of it is highly performant. Of course, dials.import with stills can't use this feature, which slows it down a lot since it can only run single process (I know there is work on this in other pull requests on this, haven't gotten to it yet, very excited about it :)

I generally agree though. I wanted to add a 'initialized' flag of some sort to Experiment to prevent multiple hits on the raw data, but I think it didn't make sense at the time. With that though, the cached models on ImageSet could be dropped.

0 replies

2020-08-03T11:47:04Z

stale[bot]
bot Aug 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. The label will be removed automatically if any activity occurs. Thank you for your contributions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imageset: should do nothing but get the raw data #1023

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

imageset: should do nothing but get the raw data #1023

graeme-winter Nov 19, 2019 Maintainer

Replies: 3 comments

ndevenish Nov 22, 2019 Maintainer

phyy-nx Nov 22, 2019 Maintainer

stale[bot] bot Aug 3, 2020

graeme-winter
Nov 19, 2019
Maintainer

ndevenish
Nov 22, 2019
Maintainer

phyy-nx
Nov 22, 2019
Maintainer

stale[bot]
bot Aug 3, 2020