CaLiGraph

A Large Semantic Knowledge Graph from Wikipedia Categories and Listings

For information about the general idea, extraction statistics, and resources of CaLiGraph, visit the CaLiGraph website.

Configuration

System Requirements

At least 300 GB of RAM as we load most of DBpedia in memory to speed up the extraction
At least one GPU to run transformers
During the first execution of an extraction you need a stable internet connection as the required DBpedia files are downloaded automatically

Prerequisites

Environment manager: conda
Dependency manager: poetry

Setup

In the project root, create a conda environment with: conda env create -f environment.yaml
Activate the environment with conda activate caligraph
Install dependencies with poetry install
Install PyTorch for your specific cuda version with poetry run poe autoinstall-torch-cuda
If you have not downloaded them already, you have to fetch the latest corpora for spaCy and nltk (run in terminal):

# download the most recent corpus of spaCy
python -m spacy download en_core_web_lg
# download wordnet & words corpora of nltk
python -c 'import nltk; nltk.download("wordnet"); nltk.download("words"); nltk.download("omw-1.4")'

Basic Configuration Options

You can configure the application-specific parameters as well as logging- and file-related parameters in config.yaml.

Usage

Make sure that the virtual environment caligraph is activated. Then you can run the extraction in the project root folder with python .

All the required resources, like DBpedia files, will be downloaded automatically during execution. CaLiGraph is serialized in N-Triple format. The resulting files are placed in the results folder.

Evaluations

Subject Entity Detection

Use the script evaluate_mention_detection.py to evaluate a specific configuration for subject entity detection.

Make sure that there is a free GPU on your system and that the environment caligraph is activated. Then you can run an evaluation as follows:

python evaluate_mention_detection.py <GPU-ID> <HUGGINGFACE-MODEL> <OPTIONAL-CONFIG-PARAMS>

Have a look at the evaluation script for a description of the optional configuration parameters.

Tests

In the project root, run tests with pytest

Name		Name	Last commit message	Last commit date
Latest commit History 2,307 Commits
config		config
data		data
impl		impl
logs		logs
results		results
tests		tests
.gitignore		.gitignore
ESWC2023-NASTyLinker.ipynb		ESWC2023-NASTyLinker.ipynb
LICENSE		LICENSE
README.md		README.md
WWW21-Information_Extraction_from_Co-Occurring_Similar_Entities.ipynb		WWW21-Information_Extraction_from_Co-Occurring_Similar_Entities.ipynb
__init__.py		__init__.py
__main__.py		__main__.py
config.yaml		config.yaml
environment.yaml		environment.yaml
evaluate_entity_disambiguation.py		evaluate_entity_disambiguation.py
evaluate_mention_detection.py		evaluate_mention_detection.py
mailer.py		mailer.py
pyproject.toml		pyproject.toml
tune_entity_disambiguation.py		tune_entity_disambiguation.py
utils.py		utils.py

License

nheist/CaLiGraph

Folders and files

Latest commit

History

Repository files navigation

CaLiGraph

Configuration

System Requirements

Prerequisites

Setup

Basic Configuration Options

Usage

Evaluations

Subject Entity Detection

Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Languages