Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study fingerprint hash collisions #48

Open
denis-stepanov opened this issue Oct 18, 2022 · 0 comments
Open

Study fingerprint hash collisions #48

denis-stepanov opened this issue Oct 18, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@denis-stepanov
Copy link
Owner

advent=> select count(hash) from fingerprints;
 count  
--------
 760255
(1 row)

advent=> select count(distinct(hash)) from fingerprints;
 count  
--------
 610168
(1 row)

That's about 20% of hash collisions. This looks high and might contribute to false positives. The task would be to study if these are true collisions or induced ones coming from the fact that Dejavu truncates the hashes before storing. If the latter, maybe the length of the hashes needs to be increased. Note that this would lead to increased CPU consumption and would require re-fingerprinting the entire data set.

@denis-stepanov denis-stepanov added the enhancement New feature or request label Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant