Skip to content

Releases: mitmedialab/sherlock-project

Feature extraction speedup, bugfixes and model code.

22 Feb 10:57
5b3ac69
Compare
Choose a tag to compare

This release provides:

  • a significant speedup and memory reduction of the feature extraction phase,
  • bugfixes in the feature extraction pipeline,
  • the code of the original model architecture (tensorflow keras),
  • alignment of the SherlockModel class with the scikit-learn API (i.e. w/ fit, predict, predict_proba methods),
  • improved notebooks demonstrating 1) full reproduction of the feature extraction and model training/evaluation pipelines, 2) out-of-the-box usage of the Sherlock model for a given table, 3) how performance can be improved with additional classifiers.

Contributions by:
@lowecg
@madelonhulsebos

Original code

09 Feb 11:57
6254a62
Compare
Choose a tag to compare
Original code Pre-release
Pre-release

This release reflects the code that was used for the experiments in the paper "Sherlock: a deep learning approach to semantic data type detection" (link to the paper on arXiv). This release provides code for:

  • Download of the original train and test data used for the experiment results as reported in the paper.
  • Feature extraction to numerically represent new columns.
  • Evaluating a trained Sherlock model on unseen table columns.
  • Retraining the original Sherlock model.

This release consists inefficiencies and bugs, hence it is recommended to use the latest release of this project in production settings or new research projects. More about this project can be found on this website.