Skip to content

AutoRDF2GML is a novel framework that semi-automatically transforms RDF data into heterogeneous graph datasets suitable for graph-based machine learning such as graph neural network (GNNs).

License

Notifications You must be signed in to change notification settings

davidlamprecht/AutoRDF2GML

Repository files navigation

AutoRDF2GML

AutoRDF2GML is an innovative framework designed to convert RDF data into graph representations suitable for graph-based machine learning methods such as Graph Neural Networks (GNNs). It uniquely generates content-based features from RDF datatype properties and topology-based features from RDF object properties, enabling the effective integration of Semantic Web technologies with Graph Machine Learning.

Overview of AutoRDF2GML

Key Features

  • Content-based Node Features: Automatically extract node features from RDF datatype properties.
  • Topology-based Edge Features: Derive edge features from RDF object properties.
  • User-friendly Interface: Features a modular design with automatic feature selection for simplicity and ease of use.
  • Graph ML Integration: Seamlessly integrates with leading frameworks like PyTorch Geometric and DGL.

Quick User Guide

For a step-by-step guide on using the framework, see our example and example-topologyfeatures directories.

Usage

To start using AutoRDF2GML, you need an (1) RDF file and (2) config file describing the configuration for the transformation. In the config file, define the RDF classes and properties as needed for your project. Once configured, execute the AutoRDF2GML script to generate a heterogeneous graph dataset suitable for your machine learning applications. For a step-by-step guide, see our example and example-topologyfeatures directories.

The output can then be used for various machine learning tasks, including node classification, link prediction, and graph classification. It can be readily integrated into common graph machine learning frameworks. For example, see how the output from AutoRDF2GML can be loaded into a PyTorch Geometric HeteroData object in this script. For instance, the structure of the loaded PyG HeteroData object is available as a directed graph here and as an undirected graph here.

Feature Configuration

Content-based Node Features

Quick example for Content-based Node Features Transformation: example

AutoRDF2GML with content-based node features is implemented in the Python script autordf2gml-cb.py. The related template and documentation of the configuration file is defined in the config-template.ini file. The default model for calculating the embeddings based on the natural language descriptions is SciBERT, but also other huggingface BERT variant models (e.g., bert-base) can be used.

Topology-based Node Features

Quick example for Topology-based Node Features Transformation: example-topologyfeatures directory.

AutoRDF2GML with topology-based node features is implemented in the Python script autordf2gml-tb.py. The related template and documentation of the configuration file is defined in the config-template.ini file. The following KG embedding models are possible for calculating the topology-based feature: TransE, DistMult, ComplEx, RotatE. The default parameters (hidden channel size 128) are defined and commented in the implementation.

Contributing

Contributions to AutoRDF2GML are welcome!

License

AutoRDF2GML is made available under the MIT License.

About

AutoRDF2GML is a novel framework that semi-automatically transforms RDF data into heterogeneous graph datasets suitable for graph-based machine learning such as graph neural network (GNNs).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages