Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ome_metadata as provided by Tifffile is a string. Fixes #953 #998

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mlt
Copy link

@mlt mlt commented May 30, 2023

Closes: #953

Previously, metadata handling expected either a tuple or presumed a dict, whereas Tifffile supplies ome metadata as a string (containing XML). Let's pass it along verbatim.

ome_metadata provided by Tifffile is a String
Copy link
Contributor

@FirefoxMetzger FirefoxMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @mlt 🚀 I have two code-specific comments in this first round as well as a some conceptual questions:

  1. It sounds like this closes an open issue, is this PR related to one? If so, could you link it here using closes: #<number>?
  2. If OME passes metadata as XML, how difficult would it be to parse the XML into a flat dict? A XML string works fine for metadata, but it would be a nice service (I think) to parse this further if possible. Do you happen to have a good reference to the spec?

Edit: I just saw that you mentioned the issue that it closes in the commit message. I will update your comment and link the issue so that it will close automatically. You can disregard my first conceptual question :)

metadata.update(flavor_metadata[0])
else:
flavor_metadata = flavor_metadata[0]
if isinstance(flavor_metadata, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(flavor_metadata, dict):
elif isinstance(flavor_metadata, dict):

I think this should be elif here, right? because we are deciding how to unpack the metadata depending on it's type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I slightly adjusted the existing code to unpack a potential dict from the tuple's first entry. Previously, there were two direct update calls. Now, if it is a tuple, we get the first entry and replace flavour_metadata with it, then actually do check if it is a dict before updating.

Comment on lines +308 to +310
else:
# ome only?
metadata.update({flag: flavor_metadata})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain your logic here so I can follow along?

If the else is only relevant for OME then I think we should switch it around and do something like:

if flag == 'ome':
    ...  # ome-specific metadata loading
elif isinstance(flavor_metadata, tuple):
    ...  # tuple-specific metadata loading
else:
    ...  # normal metadata loading

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it akin to it originally. What did bother me was the magic "ome"... what if it changes to something. So I wanted to avoid any literal. I am not familiar with much other formats so I can't tell if "ome" is a "unicorn" or if there are others that would be non-tuple & non-dict.

Once again, if you truly believe "ome" is the only one and do not mind a literal, I do not insist on the way I wrote it. I just wanted it to be more generic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even considered ignoring non-dict entries. But it makes it slightly non-trivial to extract XML because file-level "description" is missing somehow, so one would have to write iio.immeta(filename, index=0) to get it.

@mlt
Copy link
Author

mlt commented May 31, 2023

how difficult would it be to parse the XML into a flat dict

There are 2 ways to parse. We can rely on Tifffile and it's xml2dict. However, I ended up using https://github.com/tlambert03/ome-types without thinking twice. I think it is better and updateable approach. But... do you want to pull another dependency? Perhaps, it can be optional, i.e. parse with ome-types if installed otherwise pass it as is.

Edit I'm unsure about flat [dict] part as it is heavily nested.

@FirefoxMetzger
Copy link
Contributor

I did some digging as to when tifffile returns a dict, tuple, or str to make a good decision regarding the review comments.

For context: All flavor-specific metadata is handled by flavor-specific properties on the Tifffile object, hence the use of getattr(self._fh, flag + "_metadata") in ImageIO (where "flag" is the name of the flavor used by tifffile). These functions decide how the metadata for that flavor looks like, and most of them get their data from a flavor-specific tag. The exact logic is defined here (and following).

It is tuple only for shaped flavor, which is TIFFfile's custom TIFF format. I also realized that the tuple can have multiple elements, but we are currently only extracting the first and ignoring the rest. This is a bug that we can either fix in this PR or I can pick it up in a future one.

It is dict when TIFFfile has logic in place to convert the flavor-specific metadata into a dict. I found the following flavors for which this appears to be the case: lsm, stk, imagej, fluoview, sis, mdgel, metaseries, pilatus, micromanager, gdal_structural, scanimage, geotiff, astrotiff_description, streak_description.

It is str when read from a tag and not parsed further. I found the following flavors that will return str: ome, scn, philips, nih, sem, andor, epics, tvips, gdal, eer. (Note: This is based on the assumption that the metadata is stored as a string. It could also be stored as a struct, in which case [I think] we would get bytes. I don't have images to test all these flavors, but since I haven't heard of the result being bytes yet, I'm happy to go with the assumption that all are str for now until we get a report suggesting otherwise.)

Based on the above, I think the best solution is a combination of both of our approaches:

  • we have one if/elif/else block that checks for tuple, dict, or str.
  • tuple should have its own block, because it is different from receiving a dict or str so should be handled separately.
  • str should be the default/implied/else case, because this is what we will observe whenever TIFFfile doesn't do additional parsing.

Regarding the parsing of OME metadata from str to dict I think the best solution is to return str and add a note about OME-types into the docs.

I looked into depending on OME-types but decided that this is not the right way forward. The library looks pretty solid, is permissively licensed (I did not check transitive dependencies), and doesn't have many dependencies. The problem is that it returns a data-class style result that is indexed via dot notation. This means that we would access OME metadata via something like

meta = iio.immeta("ome.tiff")
desc = meta["ome"].images[0].description

instead of using (nested) dict, which is the common approach across formats so far. I.e., users would expect something like

meta = iio.immeta("ome.tiff")
desc = meta["ome"]["images"][0]["description"]

We could internally convert the result of OME-types into a dict, but that seems like going backward.

I also looked into your suggestion of re-using TIFFfile's xml2dict. I like the idea, but there is probably a reason why it is not already in use for OME considering that many other flavors are internally converted to dict before they are returned. It would also block users (like you) from parsing the metadata with something like OME-types, which has a pretty nice interface.

So in sum, I think the prudent thing to do is to return the metadata as str and then let users decide how to proceed. We can then use the docs on tifffile to add a note that points user towards OME-types as one way of further parsing the resulting XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v3 tiff handling is not OME metadata friendly
2 participants