Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quickhash implementation #85

Open
kaczmarj opened this issue Feb 22, 2024 · 0 comments
Open

quickhash implementation #85

kaczmarj opened this issue Feb 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@kaczmarj
Copy link
Contributor

kaczmarj commented Feb 22, 2024

hello, i realized that tiffslide does not calculate a quickhash for an image, whereas openslide does. i wrote a small implementation of quickhash for one of my projects, though it does not follow openslide's implementation exactly. openslide's implementation hashes many of the properties as well as the smallest level of the image pyramid. my implementation hashes only two properties and the smallest level. another difference is that openslide uses sha256 and my implementation uses md5. that choice was arbitrary on my part, but if tiffslide would incorporate quickhash, sha256 would be the way to go.

please feel free to close this issue if it's noise!

"""Hash parts of a whole slide image.

This implementation is heavily inspired by OpenSlide's quickhash1:
https://github.com/openslide/openslide/blob/549e81b6662efe2b2285f11a5bcb31ccd7b95655/src/openslide-decode-tifflike.c#L996-L1143
"""

from __future__ import annotations

import hashlib

import tiffslide
from PIL import Image
from tiffslide.tiffslide import PROPERTY_NAME_COMMENT
from tiffslide.tiffslide import PROPERTY_NAME_VENDOR


def _read_smallest_level(tslide: tiffslide.TiffSlide) -> Image.Image:
    smallest_level = tslide.level_count - 1
    size = tslide.level_dimensions[smallest_level]
    return tslide.read_region((0, 0), level=smallest_level, size=size)


def _hash_str_and_property(
    hasher: hashlib._Hash, tslide: tiffslide.TiffSlide, name: str
) -> None:
    value = tslide.properties.get(name)
    if value is not None:
        hasher.update(name.encode())
        hasher.update(str(value).encode())


def quickhash(tslide: tiffslide.TiffSlide) -> str:
    """Return a quick MD5 hash of a whole slide image."""
    m = hashlib.md5()
    _hash_str_and_property(m, tslide, PROPERTY_NAME_COMMENT)
    _hash_str_and_property(m, tslide, PROPERTY_NAME_VENDOR)
    smallest_level_bytes = _read_smallest_level(tslide).tobytes()
    m.update(smallest_level_bytes)
    return m.hexdigest()
@ap-- ap-- added the enhancement New feature or request label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants