🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
Updated
Mar 18, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
🆔 Examples for using the dedupe library
A powerful and modular toolkit for record linkage and duplicate detection in Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
🆔 Command line tool for deduplicating CSV files
Link Discovery Framework for Metric Spaces.
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
A list of free data matching and record linkage software.
Spark RDD with Lucene's query and entity linkage capabilities
Record Linkage ToolKit (Find and link entities)
Fork of the Freely Extensible Biomedical Record Linkage program
🔎 Finds fuzzy matches between CSV files
Resources for tackling record linkage / deduplication / data matching problems
Backend (Docker & API) for matchID project
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
Distributed Bayesian Entity Resolution in Apache Spark
CLK hash: hash pii for entity matching
Python package for deduplication/entity resolution using active learning
Link Wikidata items to large catalogs
Created by Halbert L. Dunn
Released 1946