GitHub - sullivannicole/simplER: Compact, easy-to-maintain unsupervised ER with existing SOTA tech

simplER

simplER is a novel easy-to-use framework that makes use of Flair for first-pass named entity recognition (NER) and GPT-3.5 for final entity extraction and classification in the text-data and data-data entity resolution (ER) contexts. By first passing observations through Flair, we ensure that only the most difficult-to-classify observations are passed onto GPT-3.5, thereby reducing runtime and increasing the efficiency of the overall simplER framework. Notably, the technologies used in simplER are free and user-friendly, making the entire pipeline easy for an analyst or scientist to implement and maintain for the foreseeable future.

Novel contributions

The contributions of our work are 3-fold:
(1) our solution is highly compact (just ~15 lines of code at its essence) while still minimizing dependencies, doesn’t require any serious hardware (we ran on Colab with the free runtime), and uses free SOTA tech, making it easy to implement and maintain;
(2) our framework includes a novel evaluation of GPT-3.5 on the hardest text-data and data-data matching tasks (where existing NER solutions fail) and
(3) our framework extends beyond public data to "sensitive"/highly local entities (not possible with TDmatch, the baseline method).

Evaluation

We evaluated simplER in both the text-data matching context as well as the data-data matching context to demonstrate it's competitive or outperforms the baseline (TDmatch).

Data to data

Fodor-Zagats

"Sensitive" spatial data to "sensitive" spatial data

Airbnb-assessors

Unstructured text to data

IMDB
CoronaCheck

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
archive		archive
data		data
img		img
utils		utils
README.md		README.md
simplER_GPT.ipynb		simplER_GPT.ipynb
simplER_dolly.py		simplER_dolly.py
simplER_eval.ipynb		simplER_eval.ipynb
simplER_flair.py		simplER_flair.py
simplER_flair_colab.ipynb		simplER_flair_colab.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive

archive

data

data

img

img

utils

utils

README.md

README.md

simplER_GPT.ipynb

simplER_GPT.ipynb

simplER_dolly.py

simplER_dolly.py

simplER_eval.ipynb

simplER_eval.ipynb

simplER_flair.py

simplER_flair.py

simplER_flair_colab.ipynb

simplER_flair_colab.ipynb

Repository files navigation

simplER

Novel contributions

Evaluation

About

Releases

Packages

Contributors 2

Languages

sullivannicole/simplER

Folders and files

Latest commit

History

Repository files navigation

simplER

Novel contributions

Evaluation

About

Resources

Stars

Watchers

Forks

Languages