Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] : Metadata in Outputs #797

Open
2 tasks
MaxGamill-Sheffield opened this issue Feb 7, 2024 · 4 comments
Open
2 tasks

[feature] : Metadata in Outputs #797

MaxGamill-Sheffield opened this issue Feb 7, 2024 · 4 comments
Labels
enhancement New feature or request Images Issues pertaining to the Images class IO Input and Output

Comments

@MaxGamill-Sheffield
Copy link
Collaborator

MaxGamill-Sheffield commented Feb 7, 2024

Is your feature request related to a problem?

Metadata such as the real lengths of an image are not currently saved in the Tiff files, this makes downstream analysis in other software like imageJ include an extra step where I need to manually input these parameters.

Describe the solution you would like.

The lengths of the tiff images when opened in ImageJ to have correct lengths (not just pixels)

Describe the alternatives you have considered.

Another file format compatible with ImageJ

Issues

@MaxGamill-Sheffield
Copy link
Collaborator Author

MaxGamill-Sheffield commented Feb 7, 2024

Moving chats from #778 to here:

@derollins

I noticed while testing PR #777 that tif was not available as an option for saving images. Although this may be due to inability of matplotlib to embed metadata into these files (matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html), tif files are a standard for lot of image analysis programmes that users may use in conjunction with Topostats as well as being the preferred format for many journal.

For these reasons I thought I would add tif to validation.py and the deafult_congif.yaml file and update documentation to reflect this update.


@ns-rse

Thanks for this PR @derollins .

I looks like you've used the ns-rse/776-config-jigging branch as a basis for your derollins/tiff_validation which means this Pull Request includes all of the changes that are out for review in #777.

Normally, although not exclusively, new branches should be made from main (and after git pull which ensures the local copy is up-to-date.

We can work round this though, perhaps most simply by waiting for #777 to be more widely tested, reviewed and merged first.

One question I would have, which might influence the implementation of supporting the TIFF format is whether the loss of metadata is likely to be problematic further down the line?

There are other ways of writing such files some of which may include metadata.

A quick scan also suggests that TIFFs can hold multiple images, is this something that is likely to be required for example when working with .asd images which contain multiple images that essentially constitute a video?


@derollins

Thanks for the feedback @ns-rse. I assumed the #777 PR would probably be merged before this so worked off that but I'll go back to main in the future.

I have never looked at the image output metadata, but having quickly opened it, doesn't look like there is much there at the moment and both the .png files and .tiff files generated by topostats have only dimensions in their metadata so I don't think this will be an issue. If this does become an issue in the future it looks like it should be possible to save TIFF files with metadata by using the PIL library but don't think it will be worth adding this now.

There is a lot of advantages to the TIFF image format, when I first started using 'old' topostats it was the default image format I used (generated though the gwydion module) until I swapped to .png because you can't use .tiff images in google docs. The ability to contain multiple images is certainly an something that could be useful in SPM image processing as although high speed .asd files might be better processed as videos (although it could be saved as a TIFF too I expect) most SPM 'images' contain multiple channel (i.e. height, phase, adhesion, charge etc.) which could potentially be conveniently combined.

I'm no optical microscopy expert but I know TIFF 'stacks' of many images either in a time or depth series are generated by other microscopy techniques and can be opened and processed in software such as ImageJ.


@ns-rse

I think we should ultimately be look at using PIL to save metadata to the images as its fairly key information.

I'll try and get round to looking at this in the coming weeks.

In the meantime there is a tpyo that is causing the pre-commit checks to fail.

@ns-rse
Copy link
Collaborator

ns-rse commented Feb 7, 2024

@MaxGamill-Sheffield

The need for metadata in the Tiff files isn't a big priority and I think it should be a new issue.

I feel it is something that should be prioritised as we should strive to adhere to the FAIR Principles which cover including Metadata not just describing the dataset but the data within it. Specifically principle R1 (my emphasis)

To make this decision, the data publisher should provide not just metadata that allows discovery, but also metadata that richly describes the context under which the data was generated. This may include the experimental protocols, the manufacturer and brand of the machine or sensor that created the data, the species used, the drug regime, etc. Moreover, R1 states that the data publisher should not attempt to predict the data consumer’s identity and needs.

Much of this metadata is in the files we read (.spm, .asd not sure if .gwy retains it though?) and we should include it not just in .tif[f] output but also the HDF5 output files.

@MaxGamill-Sheffield MaxGamill-Sheffield changed the title [feature] : Metadata in Tiff file formats [feature] : Metadata in Outputs Feb 12, 2024
@MaxGamill-Sheffield
Copy link
Collaborator Author

Renamed the issue more suitably to outputting metadata.

From the meeting we want to put more metadata into the .topostats (hdf5) file with structure below:
Screenshot 2024-02-12 at 11 08 34

I suggest we make a "metadata" section with the following contained within it:

  • Image path
  • Channel
  • px2nm scaling
  • X, Y image length in pixels
  • X, Y image length in real units

Then we can add other microscope metadata found within the spm/ibw/gwy files if needed, although they can be found within the original files themselves and are not used within the TopoStats software - maybe limiting metadata to what the application uses makes more sense?

Then we can deal with putting the metadata into tiff images to be loaded by external analysis softwares e.g. ImageJ

@ns-rse
Copy link
Collaborator

ns-rse commented Mar 5, 2024

Given the issue as reported is the metadata is not saved in the TIFF files then it would, to my mind, be logical to save it in the TIFF file so it is available for downstream processing in other software.

Of course saving it in the HDF5 would also be desirable, and I think to save going round in circles as we have done here with "adding TIFF output" and a sub-optimal quick fix solution that didn't include the PIL solution which allows inclusion of metdata it would be sensible to put all metadata that is available into both TIFF and HDF5 as it avoids having to revisit the issue and add another field in when a user decides that they want it available in .topostats rather than having to go back to the original file.

The ability to use PIL as an output for image formats is independent of what metadata to include so I'll split this into two issues.

@ns-rse ns-rse added Images Issues pertaining to the Images class IO Input and Output labels Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Images Issues pertaining to the Images class IO Input and Output
Projects
None yet
Development

No branches or pull requests

2 participants