Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp color metadata #62

Open
mtbc opened this issue Aug 17, 2020 · 19 comments
Open

Revamp color metadata #62

mtbc opened this issue Aug 17, 2020 · 19 comments

Comments

@mtbc
Copy link
Member

mtbc commented Aug 17, 2020

Split masks

For split masks the microservice specifies "dtype":"|b1" and the Zarr spec defines a Boolean data type as,

"b": Boolean (integer type where all values are only True or False)

and describes how metadata is encoded using JSON. However, in JSON, Boolean types are all-lowercase, no initial capital letter as above.

The microservice takes the JSON Boolean idea in offering, e.g., "fill_value":false and "color":{"true":-2139062144}. However, it's possible that it's meant to do a more-Pythonic initial-capital thing, or just use integers? (With 0 as false and not-0 as true? #60 (comment) suggests 1.)

Such changes are trivial, it's just very unclear. If anybody arrives at certainty, happy to open a fixing PR.

@mtbc
Copy link
Member Author

mtbc commented Aug 17, 2020

(Basically, from the point of view of developing in Java, it's generally not fun to try to interpret the Zarr v2 spec. 😕 Compressors are another thorny bit.)

@manics
Copy link
Member

manics commented Aug 17, 2020

I'd have thought we'd want to stick to using JSON types where possible, so native booleans rather than strings representing booleans.
What does the "true" in "color":{"true":-2139062144} mean?

@mtbc
Copy link
Member Author

mtbc commented Aug 17, 2020

I was surprised by that too. I'm adapting it from https://github.com/ome/omero-ms-zarr/blob/6add6804/spec.md#user-content-color, don't know why it needs stringifying though.

@manics
Copy link
Member

manics commented Aug 17, 2020

Is it an opaque string, and the fact it happens to be a number is irrelevant? i.e. each object has a string label that is associated with a color?

@mtbc
Copy link
Member Author

mtbc commented Aug 17, 2020

I think the image's pixel values have to fit the dtype as at https://zarr.readthedocs.io/en/stable/spec/v2.html#data-type-encoding.

@manics
Copy link
Member

manics commented Aug 17, 2020

Must be because JSON only allows string keys

@manics
Copy link
Member

manics commented Aug 17, 2020

Though I'd argue true/false should be represented as 1/0.

Perhaps this should be an array where array-index = object id, and if an object is missing it's set to null ?

@mtbc
Copy link
Member Author

mtbc commented Aug 17, 2020

Aha, that'll be it, thank you. Could be we decide to opt for 1/0 for split masks, it's an easy change to make to the microservice. I wonder if that violates what Zarr readers expect for |b1 re. fill value, etc.? And all-bits-set is nice because it's more obvious in napari, I haven't figured the buttons to make it have 1/0 perceptible, sigh. Could make the not-0 value configurable.

@joshmoore joshmoore changed the title Figure out color metadata for split masks Revamp color metadata Aug 19, 2020
@joshmoore
Copy link
Member

cf:

@manics: can you outline the trade-offs between the two representations. The first proposal (#65) is to have the colors in an array the size of the labels, right?

  "color": [
    null,
    8388736,
    ...
  ]

with the downside that choosing large label values leads to large arrays. (One option would be to put this array in zarr rather than json?)

The other proposal (#68) is basically to make the color metadata more explicit, right?

  "color": [
    {
      "label": 1,
      "rgba": 8388736
    },
    {
      "label": 2,
      "rgba": 10101010
    },

and then allowing to potentially update the key in the object to be something other than "rgba", e.g. "hex" or whatever?

@manics
Copy link
Member

manics commented Aug 19, 2020

Yes, pretty much what you said. The other benefit of the second option is it's extensible to include other metadata about a label, for instance you could add a name/description to each label. In this case it might be better to rename color to meta or metadata.

I think we should also change the current default colour model from "an int in whatever format OMERO uses" to a more understandable 4-tuple: [0-255, 0-255, 0-255, 0-255], or a CSS style hex value aabbccdd if you're concerned about the size of the field.

@manics
Copy link
Member

manics commented Aug 19, 2020

Decisions from todays Zoom call:

Agreed:

  • Default to storing colour as an RGBA unsigned-int 4-tuple: [uint8, uint8, uint8, uint8], this does not preclude supporting other formats in future

Rejected:

  • The non-sparse array format

To be decided:

Keep the existing layout:

"color": {
  "1": [255, 255, 255, 0],
  "4": [0, 255, 255, 128],
  ...
}

Other metadata will be stored in a different top-level key, but with the same structure.

Sparse list of dicts:

"label-metadata": [
  {
    "label-value": 1,
    "rgba": [255, 255, 255, 0]
  },
  {
    "label-value": 4,
    "rgba": [0, 255, 255, 128]
  },
  ...
]

Sparse dict of dicts where key is the label-value converted to a string:

"label-metadata": {
  "1": {
    "rgba": [255, 255, 255, 0]
  },
  "4": {
    "rgba": [0, 255, 255, 128]
  },
  ...
}

@will-moore
Copy link
Member

I prefer the first of the 3 options (most concise), but I guess the complexity of the labels parsing we're doing now for ome-zarr-py won't be very much different for any of these alternatives:

https://github.com/ome/ome-zarr-py/blob/208952c770c5d3683a8a8339173c0bc6ec158e54/ome_zarr.py#L276-L284

@manics
Copy link
Member

manics commented Aug 19, 2020

Just noticed a bug in https://github.com/ome/ome-zarr-py/blob/208952c770c5d3683a8a8339173c0bc6ec158e54/ome_zarr.py#L280-L281 bool(non-empty-string) is always True, so bool("false") == True

@joshmoore
Copy link
Member

Fixed in ome/ome-zarr-py@db43af5

@k-dominik
Copy link

If I may add my two cents.

sparse list of dicts: it is very explicit, support for a different color model is trivial, label names can be different types than strings. But the question is whether this freedom is desired? If not, than the current layout is probably fine. What I like about the other two approaches is that they are more human readable. I understand them without reading the specs. Wouldn't be so sure about the current layout.

@joshmoore
Copy link
Member

Currently working on a set of PRs to test out the following

 24 ### "image-label" metadata
 23
 22 Groups containing the `image-label` dictionary represent an image segmentation
 21 in which each unique pixel value represents a separate segmented object.
 20 `image-label` groups MUST also contain `multiscales` metadata and the two
 19 "datasets" series MUST have the same number of entries.
 18
 17 The `colors` key defines a list of JSON objects describing the unique label
 16 values. Each entry in the list MUST contain the key "label-value" with the
 15 pixel value for that label. Additionally, the "rgba" key MAY be present, the
 14 value for which is an RGBA unsigned-int 4-tuple: `[uint8, uint8, uint8, uint8]`
 13 In the case that the same `label-value` is present multiple times in the list,
 12 the last value wins.
 11
 10 Some implementations may represent overlapping labels by using a specially assigned
  9 value, for example the highest integer available in the pixel range.
  8
  7 The `source` key is an optional dictionary which contains information on the
  6 image the label is associated with. If included it MAY include a key `image`
  5 whose value is the relative path to a Zarr image group. The default value is
  4 "../../" since most labels are stored under a subgroup named "labels/" (see
  3 above).
  2
  1
161 ```
  1 "image-label":
  2   {
  3     "version": "0.1",
  4     "colors": [
  5       {
  6         "label-value": 1,
  7         "rgba": [255, 255, 255, 0]
  8       },
  9       {
 10         "label-value": 4,
 11         "rgba": [0, 255, 255, 128]
 12       },
 13       ...
 14       ]
 15     },
 16     "source": {
 17       "image": "../../"
 18     }
 19 ]
 20 ```

@joshmoore
Copy link
Member

Reading through https://qupath.readthedocs.io/en/latest/docs/advanced/exporting_annotations.html#binary-labeled-images, one wonders if the label-values need to be per channel.

@joshmoore
Copy link
Member

image-label proposal is now opened in:

Separately, also considering an array (rather than JSON metadata) similar to this example:

import numpy as np
import zarr

t = [
    ("label-value", int),
    ("r", int),
    ("g", int),
    ("b", int),
    ("a", int),
    ("object-type", "U20"),
    ("object-id", int),
    ("description", "U200")]

data = list()
for x in range(100):
    data.append((1, 100, 100, 100, 100, "Mask", 123456, "some text here"))
    data.append((2, 200, 200, 200, 200, "Mask", 567896, "some more text"))
a = np.array(data, dtype=t)

z = zarr.open("s.zarr")
z.array(name="a", data=a, chunks=(10,))

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/multi-scale-image-labels-v0-1/43483/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants