Skip to content

Entity resolution of PERSON entities with multiple addresses #2099

Answered by RobinL
JonnyNZCustoms asked this question in Q&A
Discussion options

You must be logged in to vote

If you store the addresses in an array column (i.e. a single column, with each row containing a list of addresses), then you can use an array intersection comparison

import splink.duckdb.comparison_library as cl
cl.array_intersect_at_sizes("first_name", [3, 1])

If you need the comparisons to also allow for fuzzy matches, it's possible to do that using the Spark linker at the moment, see this comment:
#1994 (comment)

More broadly this issue contains some ideas, and also some sample code for how a fuzzy array comparison could be implemented in duckdb:
#1994

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@JonnyNZCustoms
Comment options

@JonnyNZCustoms
Comment options

@RobinL
Comment options

Answer selected by JonnyNZCustoms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants