🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
Updated
Mar 18, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
🆔 Examples for using the dedupe library
A powerful and modular toolkit for record linkage and duplicate detection in Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Insightful Tutorials and Papers about Knowledge Graphs
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
🆔 Command line tool for deduplicating CSV files
On-device Speech-to-Intent engine powered by deep learning
An open source, high scalability toolkit in Java for Entity Resolution.
A list of free data matching and record linkage software.
Rust crate for entity parsing
ReFinED is an efficient and accurate entity linking (EL) system.
Entity resolution for Elasticsearch.
Record Linkage ToolKit (Find and link entities)
ReCiter: an enterprise open source author disambiguation system for academic institutions
OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.
Fork of the Freely Extensible Biomedical Record Linkage program
🔎 Finds fuzzy matches between CSV files
Created by Halbert L. Dunn
Released 1946