A short guide to approximate geocoding
-
Updated
Mar 27, 2018 - HTML
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
A short guide to approximate geocoding
A simple software that generates features and assess the accuracy of record linkage.
Supplementary code for "Class ratio and its implications for reproducibility and performance in record linkage" presented at The Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024.
Create representative records post-record linkage
Algorithm responsible for finding matches across different datasources
Fuzzymatching made easy
List of entity resolution software and resources.
Bayesian Entity Resolution with Exchangeable Random Partition Priors
A workflow template for deduplication and record linkage using the Dedupe library
The StringMetrics project implements 7 string metric algorithms: Hamming, Dice, Jaro, Jaro-Winkler, Soundex, Levenshtein, and Damerau-Levenshtein. Metrics compare strings using IMetric interface providing an approximate similarity score from 0 (no match) to 1 (exact match) useful in data cleansing, record linkage, NLP, fraud detection, etc.
A maximum-strength name parser for record linkage.
🕸️ Little helper for handling entity clusters
K-Anonymization & Record-linkage Attack
utilities for working with Entity Resolution models
The ultimate address parsing tool. Effortlessly parse and expand postal data with our cutting-edge technology. Simplify your mailing, enhance accuracy, and embrace the future of postal efficiency. Get Postalized—where precision meets convenience.
Mirror of https://bitbucket.org/resteorts/smered
🔎 Finds fuzzy matches between datasets
Data cleansing problem statement: Data in a record are often duplicated. How do we find the duplicate probability ? [Work In Progress]
Contains solution notebooks of attempted data challenges
Created by Halbert L. Dunn
Released 1946