Scalable identity resolution, entity resolution, data mastering and deduplication using ML
-
Updated
May 11, 2024 - Java
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
The SQL/Ibis powered sklearn of record linkage
Insightful Tutorials and Papers about Knowledge Graphs
Backend (Docker & API) for matchID project
AWS HealthLake patient matching with AWS Entity Resolution
Entity resolution for Elasticsearch.
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
On-device Speech-to-Intent engine powered by deep learning
This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Entity Matching" and "Entity Matching using Large Language Models".
An open-source library that leverages Pythonโs data science ecosystem to build powerful end-to-end Entity Resolution workflows.
๐๏ธ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings
Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
An open source, high scalability toolkit in Java for Entity Resolution.
๐ Finds fuzzy matches between datasets
๐ Finds fuzzy matches between CSV files
๐๏ธ Small library to simplify collecting and loading of entity alignment benchmark datasets
Created by Halbert L. Dunn
Released 1946