An open source, high scalability toolkit in Java for Entity Resolution.
-
Updated
Apr 12, 2024 - Java
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
An open source, high scalability toolkit in Java for Entity Resolution.
Entity resolution for Elasticsearch.
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
CLK hash: hash pii for entity matching
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Entity Matching Model solves the problem of matching company names between two possibly very large datasets.
Spark Search - high performance advanced search features based on Apache Lucene
Code for the paper "CollaborEM: A Self-supervised Entity Matching Framework Using Multi-features Collaboration". TKDE 2021.
Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"
Code for the paper "PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching". VLDB 2023.
Alignment, a collaborative, system aided, user driven ontology/vocabulary matching and validation platform.
Scalable record-level matching rules
A Winner-Take-All Hashing-Based Unsupervised Model for Entity Resolution Problems. [B. Sc. Thesis]
data and code for the paper: Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction
CERTA - Computing Entity Resolution explanations with TriAngles
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Created by Halbert L. Dunn
Released 1946