Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Jaccard blocking #162

Open
nbgl opened this issue Sep 21, 2018 · 0 comments
Open

Implement Jaccard blocking #162

nbgl opened this issue Sep 21, 2018 · 0 comments

Comments

@nbgl
Copy link
Contributor

nbgl commented Sep 21, 2018

See [1], page 112. We have Hamming distance-based blocking implemented as anonlink.blocking.bit_blocking. Let's implement Jaccard index-based blocking, since this should (?) do a better job finding records with high Dice coefficients.

This is a good opportunity to change the name of anonlink.blocking.bit_blocking to something that makes it clear that it does Hamming distance-based ANN (as opposed to using other metrics).

[1]Durham, Elizabeth Ashley. A framework for accurate, efficient private record linkage. Diss. Vanderbilt University, 2012. https://etd.library.vanderbilt.edu/available/etd-03262012-144837/unrestricted/dissertation.pdf

Aha! Link: https://csiro.aha.io/features/ANONLINK-59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants