logic_based_qa

Implementation of the paper Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models. In this repo I have implemented the code for translating MetaQA questions into logical predicates and then using Prolog to build a knowledge base over the MetaQA knowledge graph and answer the questions.

Try the whole pipeline in Google Colab

Installing requirements

This project has been tested on ubuntu 20.04 and macos 10.14 operating systems, with python 3.10.

python requirements

First install pytorch based on your system's requirements. Then install requirements using this command:

pip install -r requirements.txt

prolog

For installation guide check pyswip github page. A sample installation guide can be found inside the colab notebook.

Training the language to predicate model

run the commands below inside the project base directory

preparing dataset and training model

create a random sample of 1000 training examples with equal number of samples from each hops:

!PYTHONPATH=. python nl2log/data_loader.py --data_path=./data --dataset=metaqa --sample_size 1000

Previous command creates a dataset called train_1000.json. Run the trainer on this sample:

bash translation_trainer.sh 1000

evaluate the trained seq2seq model checkpoint

Evaluates the translation accuracy on all test samples:

!PYTHONPATH=. python nl2log/evaluation.py --model_cp=models/t5-small/checkpoint-x

run the full question answering pipeline

translate questions to predicates

!PYTHONPATH=. python qa/evaluation.py --model_path="./models/t5-small/checkpoint-x" --generate_predicates

evaluate the question answering module on MetaQA test dataset

!PYTHONPATH=. python qa/evaluation.py --model_path="./models/t5-small/checkpoint-x"

Manually test the model

After training the seq2seq model, you can manually test the model by running on custom questions as follow:

from qa.question_answering import QuestionAnswering
from qa.data_loader import MetaQADataLoader

data_loader = MetaQADataLoader('./data')
qa = QuestionAnswering('./models/t5-small/checkpoint-5000', data_loader)

qa.answer_question(
    "the films that share actors with the film [Creepshow] were in which languages"
)

Alternatively you can use our pretrained models on Huggingface to test the model:

from qa.question_answering import QuestionAnswering
from qa.data_loader import MetaQADataLoader

data_loader = MetaQADataLoader('./data')
qa = QuestionAnswering('navidmadani/nl2logic_t5small_metaqa', data_loader)

qa.answer_question(
    "the films that share actors with the film [Creepshow] were in which languages"
)

internally, this produces a Prolog query and then fetches the answers:

Query:
starred_actors(Creepshow,X), starred_actors_reverse(X,Y), in_language(Y,Z)

Answer:
['English', 'Polish']

Gradio app for demo

you can run the gradio application which uses our pre trained model on huggingface. First install gradio and networkx for python and then run the app.py file:

Fixing bugs in the dataset

the data in data/ file is the fixed version of the original MetaQA dataset.

modifications to the 1-hop questions in test set: The answer set to the following question in 1hop qa_test modified from

1-

[Joseph L. Mankiewicz] directed which movies
	
All About Eve|Sleuth|Cleopatra|Guys and Dolls|Suddenly|Last Summer|Julius Caesar|The Barefoot Contessa|A Letter to Three Wives|People Will Talk|No Way Out|5 Fingers|There Was a Crooked Man...|Dragonwyck|House of Strangers|Somewhere in the Night|The Honey Pot|The Quiet American|A Carol for Another Christmas

to:

All About Eve|Sleuth|Cleopatra|Guys and Dolls|Suddenly, Last Summer|Julius Caesar|The Barefoot Contessa|A Letter to Three Wives|People Will Talk|No Way Out|5 Fingers|There Was a Crooked Man...|Dragonwyck|House of Strangers|Somewhere in the Night|The Honey Pot|The Quiet American|A Carol for Another Christmas

Suddenly, Last Summer is one movie but in the test set wrongly stated as two seperate movies.

2-

which films can be described by [nastassja kinski]

Paris|Texas|Cat People|Unfaithfully Yours|Maria's Lovers

what movies can be described by [dean stockwell]	

Paris|Texas|Compulsion

changed to:

which films can be described by [nastassja kinski]

Paris, Texas|Cat People|Unfaithfully Yours|Maria's Lovers

what movies can be described by [dean stockwell]	

Paris, Texas|Compulsion

because Paris, Texas is one movie in the knowledge graph but here wrongly stated as separate movies.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
data		data
knowledge_handler		knowledge_handler
nl2log		nl2log
notebooks		notebooks
qa		qa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
experiments.sh		experiments.sh
requirements.txt		requirements.txt
translation_trainer.sh		translation_trainer.sh

License

navidmdn/logic_based_qa

Folders and files

Latest commit

History

Repository files navigation

logic_based_qa

Try the whole pipeline in Google Colab

Installing requirements

python requirements

prolog

Training the language to predicate model

preparing dataset and training model

evaluate the trained seq2seq model checkpoint

run the full question answering pipeline

translate questions to predicates

evaluate the question answering module on MetaQA test dataset

Manually test the model

Gradio app for demo

Fixing bugs in the dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Languages