This repository contains the source code released along with our paper Commonsense Knowledge Mining from Term Definitions at the Commonsense Knowledge Graphs (CSKGs) Workshop of AAAI 2021 (https://usc-isi-i2.github.io/AAAI21workshop/).
If you find our paper and code useful for your research, please cite
@inproceedings{cskmtermdefn-cskgaaai21,
title = {Commonsense Knowledge Mining from Term Definitions},
author = {Zhicheng Liang and Deborah L. McGuinness},
booktitle = {The Commonsense Knowledge Graphs (CSKGs) Workshop of AAAI 2021},
year = {2021}
}
pip install -r requirements.txt
and set python path to the repo root directory using
export PYTHONPATH=.
sh scripts/download_ckbc_data.sh data/CKBC
sh scripts/download_conceptnet.sh data/conceptnet
python wiktionary/conceptnet.py
python wiktionary/crawl_term.py
This step is time consuming due to the large vocabulary.
python wiktionary/triple_pos_tag_analyzer.py
python wiktionary/wiktionary_triple_extractor.py
python wiktionary/wiktionary_triple_evaluation.py
The code is adapted from the repo of KG-BERT: BERT for Knowledge Graph Completion. We add CKBC data for training. To train on CKBC data, run
sh kg-bert/train_ckbc.sh
After training, to evaluate on Wiktionary candidate triples, run
sh kg-bert/test_wiktionary.sh
The code is adapted from the repo of Commonsense Knowledge Mining from Pretrained Models. We modify their implementation to support batch inference for efficiency. Given the large amount of candidate triples, we also leverage the cluster to run predictions in parallel with each node running on a smaller split input.
To split candidate triple files, run
sh Extracting-CK-from-Large-LM/split_files.sh [input dir] [output dir] [number of lines per file]
To run prediction:
python Extracting-CK-from-Large-LM/wiktionary_experiment.py
--test_file_path [test_file_path]
--test_file_name [test_file_name]
--output_dir [output_dir]
For example, to score triples of the AtLocation relation, run
python Extracting-CK-from-Large-LM/wiktionary_experiment.py
--test_file_path ./data/wiktionary_relationwise_candidates_by_pos_tag_core/atlocation.txt
--test_file_name atlocation.txt
--output_dir ./data/pmi_coherency_wiktionary_core
If needed to merge scored split triple files back, run
python Extracting-CK-from-Large-LM/merge_split_files.py
--input_dir [input_dir]
--output_dir [output_dir]
For example:
python Extracting-CK-from-Large-LM/merge_split_files.py
--input_dir ./data/pmi_coherency_wiktionary_core_split
--output_dir ./data/pmi_coherency_wiktionary_core
python wiktionary/score_dist_plot.py
python wiktionary/kendall_tau.py
python wiktionary/check_novelty.py --model BILINEAR-AVG
python wiktionary/check_novelty.py --model KG-BERT
python wiktionary/check_novelty.py --model PMI
python wiktionary/sample_triple_subset.py
python wiktionary/check_novelty.py --model BILINEAR-AVG --samples_only
python wiktionary/check_novelty.py --model KG-BERT --samples_only
python wiktionary/check_novelty.py --model PMI --samples_only
If you have any question regarding the code, feel free to create a github issue or email us (emails provided in the paper).