ome_metadata as provided by Tifffile is a string. Fixes #953 #998

mlt · 2023-05-30T01:52:38Z

Closes: #953

Previously, metadata handling expected either a tuple or presumed a dict, whereas Tifffile supplies ome metadata as a string (containing XML). Let's pass it along verbatim.

ome_metadata provided by Tifffile is a String

FirefoxMetzger

Thanks for the PR @mlt 🚀 I have two code-specific comments in this first round as well as a some conceptual questions:

It sounds like this closes an open issue, is this PR related to one? If so, could you link it here using closes: #<number>?
If OME passes metadata as XML, how difficult would it be to parse the XML into a flat dict? A XML string works fine for metadata, but it would be a nice service (I think) to parse this further if possible. Do you happen to have a good reference to the spec?

Edit: I just saw that you mentioned the issue that it closes in the commit message. I will update your comment and link the issue so that it will close automatically. You can disregard my first conceptual question :)

FirefoxMetzger · 2023-05-31T18:33:59Z

imageio/plugins/tifffile_v3.py

-                        metadata.update(flavor_metadata[0])
-                    else:
+                        flavor_metadata = flavor_metadata[0]
+                    if isinstance(flavor_metadata, dict):


Suggested change

if isinstance(flavor_metadata, dict):

elif isinstance(flavor_metadata, dict):

I think this should be elif here, right? because we are deciding how to unpack the metadata depending on it's type.

Nope. I slightly adjusted the existing code to unpack a potential dict from the tuple's first entry. Previously, there were two direct update calls. Now, if it is a tuple, we get the first entry and replace flavour_metadata with it, then actually do check if it is a dict before updating.

FirefoxMetzger · 2023-05-31T18:45:55Z

imageio/plugins/tifffile_v3.py

+                    else:
+                        # ome only?
+                        metadata.update({flag: flavor_metadata})


Could you explain your logic here so I can follow along?

If the else is only relevant for OME then I think we should switch it around and do something like:

if flag == 'ome': ... # ome-specific metadata loading elif isinstance(flavor_metadata, tuple): ... # tuple-specific metadata loading else: ... # normal metadata loading

I had it akin to it originally. What did bother me was the magic "ome"... what if it changes to something. So I wanted to avoid any literal. I am not familiar with much other formats so I can't tell if "ome" is a "unicorn" or if there are others that would be non-tuple & non-dict.

Once again, if you truly believe "ome" is the only one and do not mind a literal, I do not insist on the way I wrote it. I just wanted it to be more generic.

I even considered ignoring non-dict entries. But it makes it slightly non-trivial to extract XML because file-level "description" is missing somehow, so one would have to write iio.immeta(filename, index=0) to get it.

mlt · 2023-05-31T20:55:53Z

how difficult would it be to parse the XML into a flat dict

There are 2 ways to parse. We can rely on Tifffile and it's xml2dict. However, I ended up using https://github.com/tlambert03/ome-types without thinking twice. I think it is better and updateable approach. But... do you want to pull another dependency? Perhaps, it can be optional, i.e. parse with ome-types if installed otherwise pass it as is.

Edit I'm unsure about flat [dict] part as it is heavily nested.

FirefoxMetzger · 2023-06-11T09:03:26Z

I did some digging as to when tifffile returns a dict, tuple, or str to make a good decision regarding the review comments.

For context: All flavor-specific metadata is handled by flavor-specific properties on the Tifffile object, hence the use of getattr(self._fh, flag + "_metadata") in ImageIO (where "flag" is the name of the flavor used by tifffile). These functions decide how the metadata for that flavor looks like, and most of them get their data from a flavor-specific tag. The exact logic is defined here (and following).

It is tuple only for shaped flavor, which is TIFFfile's custom TIFF format. I also realized that the tuple can have multiple elements, but we are currently only extracting the first and ignoring the rest. This is a bug that we can either fix in this PR or I can pick it up in a future one.

It is dict when TIFFfile has logic in place to convert the flavor-specific metadata into a dict. I found the following flavors for which this appears to be the case: lsm, stk, imagej, fluoview, sis, mdgel, metaseries, pilatus, micromanager, gdal_structural, scanimage, geotiff, astrotiff_description, streak_description.

It is str when read from a tag and not parsed further. I found the following flavors that will return str: ome, scn, philips, nih, sem, andor, epics, tvips, gdal, eer. (Note: This is based on the assumption that the metadata is stored as a string. It could also be stored as a struct, in which case [I think] we would get bytes. I don't have images to test all these flavors, but since I haven't heard of the result being bytes yet, I'm happy to go with the assumption that all are str for now until we get a report suggesting otherwise.)

Based on the above, I think the best solution is a combination of both of our approaches:

we have one if/elif/else block that checks for tuple, dict, or str.
tuple should have its own block, because it is different from receiving a dict or str so should be handled separately.
str should be the default/implied/else case, because this is what we will observe whenever TIFFfile doesn't do additional parsing.

Regarding the parsing of OME metadata from str to dict I think the best solution is to return str and add a note about OME-types into the docs.

I looked into depending on OME-types but decided that this is not the right way forward. The library looks pretty solid, is permissively licensed (I did not check transitive dependencies), and doesn't have many dependencies. The problem is that it returns a data-class style result that is indexed via dot notation. This means that we would access OME metadata via something like

meta = iio.immeta("ome.tiff")
desc = meta["ome"].images[0].description

instead of using (nested) dict, which is the common approach across formats so far. I.e., users would expect something like

meta = iio.immeta("ome.tiff")
desc = meta["ome"]["images"][0]["description"]

We could internally convert the result of OME-types into a dict, but that seems like going backward.

I also looked into your suggestion of re-using TIFFfile's xml2dict. I like the idea, but there is probably a reason why it is not already in use for OME considering that many other flavors are internally converted to dict before they are returned. It would also block users (like you) from parsing the metadata with something like OME-types, which has a pretty nice interface.

So in sum, I think the prudent thing to do is to return the metadata as str and then let users decide how to proceed. We can then use the docs on tifffile to add a note that points user towards OME-types as one way of further parsing the resulting XML.

This fixes imageio#953

ffade45

ome_metadata provided by Tifffile is a String

FirefoxMetzger reviewed May 31, 2023

View reviewed changes

aseltmann mentioned this pull request Jul 4, 2023

Fix tifffile OME metadata when imageio.v3 merges their fix aseltmann/colabs-for-fcs#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ome_metadata as provided by Tifffile is a string. Fixes #953 #998

ome_metadata as provided by Tifffile is a string. Fixes #953 #998

mlt commented May 30, 2023 •

edited by FirefoxMetzger

FirefoxMetzger left a comment •

edited

FirefoxMetzger May 31, 2023

mlt May 31, 2023

FirefoxMetzger May 31, 2023

mlt May 31, 2023

mlt May 31, 2023

mlt commented May 31, 2023 •

edited

FirefoxMetzger commented Jun 11, 2023

	if isinstance(flavor_metadata, dict):
	elif isinstance(flavor_metadata, dict):

ome_metadata as provided by Tifffile is a string. Fixes #953 #998

Are you sure you want to change the base?

ome_metadata as provided by Tifffile is a string. Fixes #953 #998

Conversation

mlt commented May 30, 2023 • edited by FirefoxMetzger

FirefoxMetzger left a comment • edited

Choose a reason for hiding this comment

FirefoxMetzger May 31, 2023

Choose a reason for hiding this comment

mlt May 31, 2023

Choose a reason for hiding this comment

FirefoxMetzger May 31, 2023

Choose a reason for hiding this comment

mlt May 31, 2023

Choose a reason for hiding this comment

mlt May 31, 2023

Choose a reason for hiding this comment

mlt commented May 31, 2023 • edited

FirefoxMetzger commented Jun 11, 2023

mlt commented May 30, 2023 •

edited by FirefoxMetzger

FirefoxMetzger left a comment •

edited

mlt commented May 31, 2023 •

edited