Skip to content

bradleypallen/evaluating-kg-class-memberships-using-llms

Repository files navigation

evaluating-kg-class-memberships-using-llms

Code and data for experiments on the evaluation of class membership relations in knowledge graphs using LLMs

Bradley P. Allen and Paul T. Groth
INtelligent Data Engineering Lab
University of Amsterdam, Amsterdam, The Netherlands

Overview

A backbone of knowledge graphs are their class membership relations, which assign entities to a given class. As part of the knowledge engineering process, we propose a new method for evaluating the quality of these relations by processing descriptions of a given entity and class using a zero-shot chain-of-thought classifier that uses a natural language intensional definition of a class (Figure 1). This repository contains the data and code involved in an evaluation of this method.

A zero-shot chain-of-thought classifier applied to the class clgo:Romania international rugby union player and the entity clgr:Iosif Nemes from the CaLiGraph knowledge graph.
Figure 1: A zero-shot chain-of-thought classifier applied to the class clgo:Romania international rugby union player and the entity clgr:Iosif Nemes from the CaLiGraph knowledge graph.

We evaluated the method using two publicly available knowledge graphs, Wikidata and CaLiGraph, and 7 large language models. Using the gpt-4-0125-preview large language model, the method’s classification performance achieved a macro-averaged F1-score of 0.830 on data from Wikidata and 0.893 on data from CaLiGraph. Moreover, a manual analysis of the classification errors showed that 40.9% of errors were due to the knowledge graphs, with 16.0% due to missing relations and 24.9% due to incorrectly asserted relations.

The principal contributions of this work are 1) a formal approach to the design of a neurosymbolic knowledge engineering workflow integrating KGs and LLMs, and 2) experimental evidence that this method can assist knowledge engineers in addressing the correctness and completeness of KGs, potentially reducing the effort involved in knowledge acquisition and elicitation.

License

MIT.

Requirements

  • Python 3.11 or higher.
  • OPENAI_API_KEY and HUGGINGFACE_API_TOKEN environment variables set to your respective OpenAI and Hugging Face API keys.

Installation

$ git clone https://github.com/bradleypallen/evaluating-kg-class-memberships-using-llms.git
$ cd evaluating-kg-class-memberships-using-llms
$ python -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt

Software and data artifacts in this repository

Source code

  • Classifier implementation: classifier.py
  • Utilities for running experiments and displaying results: utils.py

Experiments

Findings

Classifier performance

Error analysis

Usage

Running the experiments

  1. Delete the existing model-specific classification files in the /experiments subdirectory.
  2. Execute wikidata_experiment.ipynb and caligraph_experiment.ipynb to run each of the seven LLMs over the data sets for Wikidata and CaLiGraph, respectively.
  3. Occasionally, a given run will throw an error, typically due to an API timeout or other service-related problem. In those instances, simply re-execute the notebook, and the processing will restart after the last model and last class being processed.

Viewing classifier performance metrics

  1. Execute wikidata-classifier-performance.ipynb and caligraph-classifier-performance.ipynb to view the performance statistics for each of the seven LLMs' classifications for Wikidata and CaLiGraph, respectively. This can be done while experiments are being run, after the first model has processed the first class.

Viewing classification errors

  1. Execute gpt-4-0125-preview-errors.ipynb to view the false positives and false negatives by gpt-4-0125-preview for each class in both Wikidata and CaLiGraph.
  2. To view errors for another model, replace experiments/gpt-4-0125-preview-wikidata.json and experiments/gpt-4-0125-preview-caligraph.json with the appropriate model classification results files in the calls to display_errors.

Annotating classifications for error analysis

  1. Execute gpt-4-0125-preview-error-analysis-prep.ipynb to generate the CSV files containing the classifications errors for the Wikidata and CaLiGraph experiments.
  2. To generate CSV files for another model, replace experiments/gpt-4-0125-preview-wikidata.json and experiments/gpt-4-0125-preview-caligraph.json with the appropriate model classification results files in the calls to json.load.
  3. Using a spreadsheet application (e.g Excel or Numbers), import the generated CSV files, adding four columns with headers "missing data", "missing relation", "incorrect relation", and "incorrect reasoning" to the right.
  4. For each row, annotate the cells in the new columns in the following manner, such that only one for these four cells in the row should be marked 'True', and the others marked 'False':
    • If the error is due to missing data in the entity description, mark "missing data" 'True', else 'False'.
    • If the error is due to a missing relation in the knowledge graph, mark "missing relation" 'True', else 'False'.
    • If the error is due to an incorrect relation in the knowledge graph, mark "incorrect relation" 'True' else 'False'.
    • If the error is due to missing data in the entity description, mark "incorrect reasoning" 'True' else 'False'.
  5. Export the annotated spreadsheets for Wikidata and CaLiGraph to error-analysis/wd_err_annotated.csv and error-analysis/cg_err_annotated.csv, respectively.

Viewing the error analysis

  1. Execute gpt-4-0125-preview-error-analysis.ipynb to view the results of the error analysis.

About

Code and data for experiments on the evaluation of class membership relations in knowledge graphs using LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published