Skip to content
@cleanzr

cleanzr

Popular repositories

  1. record-linkage-tutorial record-linkage-tutorial Public

    A tutorial on entity resolution (record linkage or de-duplication)

    TeX 61 15

  2. dblink dblink Public

    Distributed Bayesian Entity Resolution in Apache Spark

    Scala 54 9

  3. fasthash fasthash Public

    Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).

    Python 14 3

  4. clevr clevr Public

    Clustering and Link Prediction Evaluation in R

    R 10 3

  5. representr representr Public

    Create representative records post-record linkage

    R 7

  6. blink blink Public

    This is main code for Steorts (2015), which is also on CRAN. Please cite the paper/code if you find this useful.

    HTML 5 4

Repositories

Showing 10 of 21 repositories
  • blink Public

    This is main code for Steorts (2015), which is also on CRAN. Please cite the paper/code if you find this useful.

    HTML 5 4 0 0 Updated Jan 10, 2024
  • exchanger Public

    Bayesian Entity Resolution with Exchangeable Random Partition Priors

    C++ 5 GPL-3.0 0 0 0 Updated Jan 7, 2024
  • clevr Public

    Clustering and Link Prediction Evaluation in R

    R 10 GPL-2.0 3 1 0 Updated Sep 23, 2023
  • representr Public

    Create representative records post-record linkage

    R 7 0 0 0 Updated Sep 5, 2023
  • exchanger-experiments Public

    Scripts for reproducing the experiments in our JSSAM article on Bayesian Graphical Entity Resolution

    R 0 GPL-3.0 1 0 0 Updated Jan 24, 2023
  • microclustr Public

    Package for Betancourt, Zanella, and Steorts

    C++ 2 1 0 1 Updated Aug 22, 2022
  • dblink-experiments Public

    Details for reproducing the experiments in our d-blink paper

    R 0 MIT 0 0 0 Updated Jun 10, 2021
  • dblinkR Public

    An R interface for the dblink Spark application

    R 5 1 2 0 Updated Jun 10, 2021
  • dblink Public

    Distributed Bayesian Entity Resolution in Apache Spark

    Scala 54 9 4 0 Updated Jun 10, 2021
  • italy Public

    A sample survey conducted by the Bank of Italy every two years containing duplicated data.

    R 0 0 0 0 Updated Apr 19, 2021

Top languages

Loading…

Most used topics

Loading…