Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
Jun 12, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
Link Discovery Framework for Metric Spaces.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
The SQL/Ibis powered sklearn of record linkage
Hierarchical record linkage at scale
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
CERTA - Computing Entity Resolution explanations with TriAngles
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.
Example scripts for generating data with Gecko
Backend (Docker & API) for matchID project
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
Spark RDD with Lucene's query and entity linkage capabilities
Supplementary code for "Class ratio and its implications for reproducibility and performance in record linkage" presented at The Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024.
Interpretable metadata for the results of NHS England record linkage
An exploration of generalizable approaches to unsupervised entity matching for use in linking tabular public energy data sources.
🔎 Finds fuzzy matches between datasets
🔎 Finds fuzzy matches between CSV files
Fast, accurate, open-source geocoding in Python
Created by Halbert L. Dunn
Released 1946