Skip to content

DOREMUS-ANR/legato

Repository files navigation

Legato : Disambiguating and Linking Heterogeneous Resources Across RDF Graphs

An automatic data linking tool developed by DOREMUS.

About Legato

Legato is based on the following steps:

  1. Data cleaning: A preprocessing step allowing for an efficient instances comparison.
  2. Instance profiling: Instance represenation (subgraph) based on the Concise Bounded Description (CBD) of the classe allowing to extract information considered relevant for the entity comparison task.
  3. Indexing and Instance matching: We apply standard NLP techniques to index the instance profiles by using a term frequency vector model. The threshold value of Legato applies to the similarity computed at this stage. Low thresholds are recommended to ensure high recall (default 0.2).
  4. Link repairing: A post-processing step to repair erroneous links generated in the matching step by clustering highly similar instances together and applying a key-identification adn ranking algorthims.

How to run Legato

For running Legato through the GUI, please run the "main.java" class in the "legato" package. Then, select the source, the target and a reference alignement (if availble). Then, you can choose between two treatment's modes:

  • Automatic allows to filter resources by fixing only the classes to compare.
  • Manual allows to filter resources by classe and comparate by a set of selected properties. The field "threshold value" allows to define Legato's threshold in the Instance matching step. Legato will consider only resources with a similarity higher than the threshold value. When you have chosen the mode and features for filter, click on "run" for link generation and (optionally) evaluating the produced links. If no reference alignement file exists, Legato matches the instances without evaluating the produced links.

Benchmark datasets: DOREMUS data (DOREMUS data about classical music).

The figure, below, illustrates the interfce of Legato :

GUI

Requirements

JDK 8 or later

About

Data interlinking tool developed by DOREMUS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages