A short guide to approximate geocoding
-
Updated
Mar 27, 2018 - HTML
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
A short guide to approximate geocoding
A simple software that generates features and assess the accuracy of record linkage.
A database management system for restaurant inspection records, restaurant-related tweets, and other relevant data.
Supplementary code for "Class ratio and its implications for reproducibility and performance in record linkage" presented at The Pacific-Asia Conference on Knowledge Discovery and Data Mining 2024.
Algorithm responsible for finding matches across different datasources
The StringMetrics project implements 7 string metric algorithms: Hamming, Dice, Jaro, Jaro-Winkler, Soundex, Levenshtein, and Damerau-Levenshtein. Metrics compare strings using IMetric interface providing an approximate similarity score from 0 (no match) to 1 (exact match) useful in data cleansing, record linkage, NLP, fraud detection, etc.
K-Anonymization & Record-linkage Attack
A workflow template for deduplication and record linkage using the Dedupe library
Contains solution notebooks of attempted data challenges
Homework of 2022-2023 Ingegneria dei dati course at Roma Tre University.
Finding duplicate records using Record Linkage Comparison and BigData through Apache Spark
Record linkage - simple, flexible, efficient.
A META (FACEBOOK) PROJECT - Purpled allows artist to distribute content and monetize artistry. Contribute to the success of both new and experienced artists. Every like, play, remark, and repost reverberates, establishing a creator's reputation, motivating them, and expanding their reach making you always have the great music at your fingertips.
This repository contains code used during my masters thesis: Record linkage in 18th century VOC slave archives.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
Low Cost Entity Resolution with Transformers
Repository per HW6, Corso di Ingegneria dei Dati 2023/24
Repository contenete gli Homework per il corso di Ingegneria dei Dati 2022/2023.
Created by Halbert L. Dunn
Released 1946