Benchmarking

Getting Requirements

For constructing benchmarks
- python 3.7.0
- requests_html: pip install requests_html
- tensorflow 1.14.0
For query answering
- java 1.8

Compiling QuatE

cd QuatE-master
bash make.sh
cd ..
mkdir release
mv QuatE-master/release/Base.so release/

Learning rules
```
python -u main.py
```
- parameters
  - test_id: directory to store output files
  - RULE_LEN: rule length
  - RULE_DEP: rewriting depth
  - LIMIT_RULES: the maximum value of the number of rules of some target predicate when reading a file of rules
  - BEAM_SIZE: the maximum value of learned rules of some target predicate is BEAM_SIZE//2
  - others: depends the volume of data sampled for learning rules
Merging learned rules without recursion and sampling data according to the rules
```
python -u sampling_qa.py
```
Remember to change the names of files or directories

queries
- LC-QuAD (question collection of 5K natural language questions with their corresponding SPARQL queries): queries/linked.json
- 1961 SPARQL queries after filtering: queries/sparql1961.txt
- 5 SPARQL quesries used for evaluations in DLGP format: queries/sparql1961-5.dlp
ontologies for evaluations
- rules/rules_*.dlp named by the number of rules in it
datasets for evaluations
- https://drive.google.com/drive/folders/1Bppxo1ns5fKBmH6cwYR2hK3Tp6LoJhum?usp=sharing
- sampling_qa.py: getting facts by rules and expected volume of them from the data pool above

About the rewriting depth

The argument rewriting depth decides the number of iterations of rule learning. Yet the real rewriting depth of the learned rules is unnecessary to be equal to the specified value. On the one hand, the tested systems cannot process recursive rules. So a recursion elimination is needed, which can make the real rewriting depth smaller. On the other hand, a predicate occurs in a rule body can be unexpectedly identical to the predicate of the head atom of another rule, which makes the real rewriting depth larger. In the future, we will manage to control the real rewriting depth of the learned rules.
Fixed Benchmarks v.s. Framework of Benchmark Construction

So far, we have not given any fixed benchmarks but just the framework for benchmark construction. Because the ontologies from rule learning and the datasets to make the answers of tested systems interesting are both dependent on the target queries. In our experiment, we just select 5 CQs from LC-QuAD. Thus, you need to generate your own benchmark based on the queries that you are interested in.
If you have any question using the benchmark, feel free to contact us (bohemianccc@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
QuatE-master		QuatE-master
embedding		embedding
queries		queries
rules		rules
README.md		README.md
cal_results.py		cal_results.py
dlp_translater.py		dlp_translater.py
filtering.py		filtering.py
formatter.py		formatter.py
get_facts.py		get_facts.py
main.py		main.py
mapping.py		mapping.py
meaningless_predicates.txt		meaningless_predicates.txt
plot.py		plot.py
prefix.txt		prefix.txt
rule_learning.py		rule_learning.py
sampling_by_pres_mp.py		sampling_by_pres_mp.py
sampling_mp.py		sampling_mp.py
sampling_qa.py		sampling_qa.py
utilities.py		utilities.py