Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash size doesn't match hash_size parameter for Daubechies wavelets hashing #149

Open
jonemo opened this issue Sep 27, 2021 · 4 comments
Open

Comments

@jonemo
Copy link

jonemo commented Sep 27, 2021

I am surprised that the size of the hash computed is not equal to the hash_size parameter available for all hashing methods. Specifically, imagehash.whash(img, hash_size=16, mode="db4") yields a hash of size 22 x 22.

While the readme does not make any explicit promises about the hash size, the naming of parameters makes this outcome quite unexpected. Of course, me being surprised is not an issue in itself and unless this is a bug, it would be unreasonable to break backward compatibility with a change in API or behavior. However, maybe it's worth adding clarification that hash_size does not always match hash size in the documentation/readme?

The readme currently covers hash_size in this paragraph:

Each algorithm can also have its hash size adjusted (or in the case of colorhash, its binbits). Increasing the hash size allows an algorithm to store more detail in its hash, increasing its sensitivity to changes in detail.

Sample code:

    img = Image.open(path)
    hash = imagehash.average_hash(img, hash_size=16)
    print(f"average_hash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.dhash(img, hash_size=16)
    print(f"dhash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.phash(img, hash_size=16)
    print(f"phash: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="haar")
    print(f"whash haar: {len(hash.hash)} x {len(hash.hash[0])}")
    hash = imagehash.whash(img, hash_size=16, mode="db4")
    print(f"whash db4: {len(hash.hash)} x {len(hash.hash[0])}")

Output:

average_hash: 16 x 16
dhash: 16 x 16
phash: 16 x 16
whash haar: 16 x 16
whash db4: 22 x 22

Example image:

tl-20210924-185242

@JohannesBuchner
Copy link
Owner

Huh. Do you know why db4 does that?

@jonemo
Copy link
Author

jonemo commented Sep 7, 2022

Sorry, I am the wrong person to ask this question. I used imagehash precisely because I have no clue about any of these algorithms. (And that was a year ago, now I know even less.)

@JohannesBuchner
Copy link
Owner

@JohannesBuchner
Copy link
Owner

In any case, given how differently the various methods work, no, hash_size does not necessarily have to have a consistent meaning across all methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants