Comparatives for Natural Language Inference

This repository contains code for our papers:

Izumi Haruta, Koji Mineshima and Daisuke Bekki. 2023. Implementing Natural Language Inference for comparatives. Journal of Language Modelling.
Izumi Haruta, Koji Mineshima and Daisuke Bekki. 2020. Combining Event Semantics and Degree Semantics for Natural Language Inference. In Proceedings of COLING 2020.
Izumi Haruta, Koji Mineshima and Daisuke Bekki. 2020. Logical Inferences with Comparatives and Generalized Quantifiers. In Proceedings of the 2020 ACL Student Research Workshop (ACL-SRW).

Requirements

Python 3.6.5+
Vampire 4.4.0
Tsurgeon 3.9.2
spaCy 2.1.8
word2number 1.1+

tag: `acl-srw2020`

Vampire 4.3.0

Setup

The system uses scripts available from ccg2lambda. It is necessary to install python3 (3.6.5 or later), nltk, lxml, simplejson and pyyaml python libraries. If python3 and pip are already installed, you can install these packages with pip:

$ pip install lxml simplejson pyyaml nltk

Installing Vampire and Tsurgeon

To install Vampire and Tsurgeon, change your directory to where you cloned the repository and run the following:
```
$ cd ccgcomp
$ ./tools/install_tools.sh
```
This command downloads Vampire (version 4.4.0) to ccgcomp/vampire-4.4 and Tsurgeon (version 3.9.2) to ccgcomp/stanford-tregex-2018-10-16. You can change the location of Vampire and Tsurgeon by editing scripts/vampire_dir.txt and scripts/tregex_location.txt.
```
$ cat scripts/vampire_dir.txt
/Users/izumi/ccgcomp/vampire-4.4
$ cat scripts/tregex_location.txt
/Users/izumi/ccgcomp/stanford-tregex-2018-10-16
```

Installing CCG parsers

There are the following three as typical CCG parsers.

In the paper, our system is experimented by using two parsers, C&C and depccg. If you hope the same environment, choose mac or linux and execute the following command:

Mac
```
$ ./tools/install_parsers.sh mac
```
Linux
```
$ ./tools/install_parsers.sh linux
```

This command downloads C&C to ccgcomp/candc-1.00, EasyCCG to ccgcomp/easyccg, and depccg to ccgcomp/depccg. You can change the location of C&C by editing scripts/parser_location.txt.

$ cat scripts/parser_location.txt
candc:/Users/izumi/ccgcomp/candc-1.00
easyccg:/Users/izumi/ccgcomp/easyccg
depccg:

Selecting MED dataset

To get MED.tsv and divide the dataset into two types, do the following in ccgcomp directory:
```
$ ./tools/extract_med.sh
```

Then, we are ready to divide it into those that do not require lexical knowledge (gq tag) and those that require lexical knowledge (gqlex tag) in ./med_plain and ./fracas_plain.

$ ls ./med_plain/*
./med_plain/med_1000_gq.answer     ./med_plain/med_465_gqlex.txt
./med_plain/med_1000_gq.txt        ./med_plain/med_466_gqlex.answer
./med_plain/med_1001_gq.answer     ./med_plain/med_466_gqlex.txt
./med_plain/med_1001_gq.txt        ./med_plain/med_467_gq.answer
・・・

Setting HANS test dataset

To get HANS dataset, do the following in ccgcomp directory:

./tools/setting_hans.sh

Then the test data of the hans is set to ./hans_plain.

ls ./hans_plain/*
./hans_plain/hans_10000_lexical_overlap.answer  ./hans_plain/hans_10000_lexical_overlap.txt
./hans_plain/hans_10001_subsequence.answer      ./hans_plain/hans_10001_subsequence.txt
./hans_plain/hans_28001_constituent.answer      ./hans_plain/hans_28001_constituent.txt
...

Setting SICK dataset and downloading VerbOcean

To get SICK dataset and download VerbOcean, do the following in ccgcomp directory:

./ccg2lambda/download_dependencies.sh

Then, if you do ./scripts/eval_sick.sh, the problems are set in ./sick_plain.

$ ls ./sick_plain/*
./sick_plain/sick_test_10.answer      ./sick_plain/sick_train_1064.txt
./sick_plain/sick_test_10.txt         ./sick_plain/sick_train_1065.answer
./sick_plain/sick_test_1001.answer    ./sick_plain/sick_train_1065.txt
./sick_plain/sick_test_1001.txt       ./sick_plain/sick_train_1068.answer
...

Running the system on several datasets

Evaluation on FraCaS, MED, CAD, and HANS

You can use the FraCaS, MED, CAD, and HANS test sets.

Usage:

 ./scripts/eval_fracas.sh <nbest> <ncore> <templates> <list of section numbers>

Example:

./scripts/eval_fracas.sh 1 2 scripts/semantic_template_event.yaml 5 6

<nbest>: the number of output patterns of derivation trees
<ncore>: the number of core

FraCaS section number:

------------------------------------------------------
FraCaS
------------------------------------------------------
sec   topic            start   count     %     premise
---   -----------      -----   -----   -----   -------
 1    Quantifiers         1      80     23 %      50
 2    Plurals            81      33     10 %      24
 3    Anaphora          114      28      8 %       6
 4    Ellipsis          142      55     16 %      25
 5    Adjectives        197      23      7 %      15
 6    Comparatives      220      31      9 %      16
 7    Temporal          251      75     22 %      39
 8    Verbs             326       8      2 %       8
 9    Attitudes         334      13      4 %       9
 
------------------------------------------------------
CAD
------------------------------------------------------
10    Adjective
11    Adverb
12    Comparative

------------------------------------------------------
MED
------------------------------------------------------
13    gq
14    gqlex

------------------------------------------------------
HANS
------------------------------------------------------
15   hans-lexical_overlap
16   hans-subsequence
17   hans-constituent

The outputs are shown as:

candc parsing cache/fracas_220_comparatives.txt
execute tsurgeon cache/fracas_220_comparatives.txt.candc.ptb
semantic parsing cache/fracas_220_comparatives.txt.candc.sem.xml
judging entailment cache/fracas_220_comparatives.txt.candc.sem.xml unknown
...
----------------------------------------------------------------------------
Multi-parsers:
                              all premi.         single           multi
generalized_quantifiers   |     0.9452     |     0.9318     |     0.9655     
plurals                   |      ----      |      ----      |      ----     
adjectives                |     0.9545     |     0.9333     |     1.0000     
comparatives              |     0.8387     |     0.7500     |     0.9333     
verbs                     |      ----      |      ----      |      ----     
attitudes                 |      ----      |      ----      |      ----     
adjective                 |      ----      |      ----      |      ----     
comparative               |      ----      |      ----      |      ----     
gq                        |      ----      |      ----      |      ----     
gqlex                     |      ----      |      ----      |      ----     
lexical_overlap           |      ----      |      ----      |      ----     
subsequence               |      ----      |      ----      |      ----     
constituent               |      ----      |      ----      |      ----
total                     |     0.9206     |     0.8933     |     0.9608
----------------------------------------------------------------------------
C&C:
                              all premi.          single           multi
・・・

Evaluation on SICK

In addition, you can use the following for SICK dataset:

Usage:

./scripts/eval_sick.sh <ncores> <split> <templates>

Example:

./scripts/eval_sick.sh 10 test scripts/semantic_template_event.yaml

<split>:trial (500 questions), train (5000 questions) and test (4500 questions).

By default, created files are to be stored in the three directories (cache, en_results, tptp).

cache/*.sem.xml -- CCG derivation tress in XML format
en_results/*.answer -- system prediction (yes, no, unknown)
en_results/*.html -- visualized CCG derivation tree with semantic representation: the CCG tree for each FraCaS problem is accessible from en_results/main_section*.html.
en_results/*.time -- the time that is taken to prove the problem
tptp/*.tptp -- semantic representation in tptp format

You can also run the system on each inference in FraCaS. For example, the following tried to prove the inference with ID FraCaS-243:

./scripts/rte_neg.sh fracas_plain/fracas_243_comparatives.txt scripts/semantic_templates.yaml

Or, you can do the experiment for each component, as described in Ablation experiment:

./scripts/rte_neg_ablation.sh fracas_plain/fracas_243_comparatives.txt scripts/semantic_templates.yaml 4

Ablation experiment

To gain insights into the impact of each component, we performed an ablation experiment on overall performance.

Usage:

FraCaS, MED, CAD, and HANS

./scripts/eval_fracas_ablation.sh <nbest> <ncore> <templates> <option> <list of section numbers>

SICK

./scripts/eval_sick_ablation.sh <ncores> <option> <split> <templates>

<option>: Set the option of ablation experiment
- normal: 1, +tsurgeon: 2, +abduction: 3, +rule: 4, +implicature: 5

Example:

FraCaS, MED, CAD, and HANS

./scripts/eval_fracas_ablation.sh 1 3 ./scripts/eval_sick.sh scripts/semantic_template_event.yaml

SICK

./scripts/eval_sick_ablation.sh 10 tsurgeon test scripts/semantic_template_event.yaml

Code Structure

The code is divided into the following:

./ccg2lambda -- scripts from ccg2lambda
./fracas_plain -- inference problems from FraCaS and CAD by default. In each fracas_plain/*.txt file, a set of premises and a hypothesis are shown line by line. For example, for ID fracas_220, the first two lines are premises and the final line "The PC_6082 is fast" is a hypothesis.
```
$ cat fracas_plain/fracas_220_comparatives.txt
The PC_6082 is faster than the ITEL_XZ.
The ITEL_XZ is fast.
The PC_6082 is fast.
```
The gold answer label is in fracas_plain/*.txt.
```
$ cat fracas_plain/fracas_220_comparatives.answer
yes
```
./scripts - main scripts including semantic templates (scripts/semantic_templates.yaml), Tsurgeon script (scripts/transform.tsgn) and the COMP axioms.
./tools - tools for setup
./CAD - CAD dataset which focuses on comparatives and numerical constructions

Citation

Izumi Haruta, Koji Mineshima, and Daisuke Bekki. Implementing Natural Language Inference for comparatives. Journal of Language Modelling, 10(1), 139–191, 2023. pdf

@article{Haruta-Mineshima-Bekki-2023,
    title={Implementing Natural Language Inference for comparatives},
    author={Haruta, Izumi and Mineshima, Koji and Bekki, Daisuke},
    journal={Journal of Language Modelling},    
    volume={10},
    number={1},
    DOI={10.15398/jlm.v10i1.294},
    year={2023},
    month={Jan.},
    pages={139–191}
}

Izumi Haruta, Koji Mineshima, and Daisuke Bekki. Combining Event Semantics and Degree Semantics for Natural Language Inference. Proceedings of the 28nd International Conference on Computational Linguistics (COLING), pages 1758--1764, Online, December, 2020. pdf

@inproceedings{haruta-etal-2020-combining,
    title = "Combining Event Semantics and Degree Semantics for Natural Language Inference",
    author = "Haruta, Izumi  and
      Mineshima, Koji  and
      Bekki, Daisuke",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.156",
    pages = "1758--1764",
}

Izumi Haruta, Koji Mineshima, and Daisuke Bekki. Logical Inferences with Comparatives and Generalized Quantifiers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL-SRW), pages 263--270, Online, july, 2020. pdf

@inproceedings{haruta-etal-2020-logical,
    title = "Logical Inferences with Comparatives and Generalized Quantifiers",
    author = "Haruta, Izumi  and
      Mineshima, Koji  and
      Bekki, Daisuke",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL-SRW)",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-srw.35",
    pages = "263--270",
}

Results

You can see the html files as the results that maps CCG derivation trees to semantic representaions.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
CAD		CAD
ccg2lambda		ccg2lambda
fracas_plain		fracas_plain
scripts		scripts
tools		tools
README.md		README.md

izumi-h/ccgcomp

Folders and files

Latest commit

History

Repository files navigation

Comparatives for Natural Language Inference

Requirements

tag: acl-srw2020

Setup

Installing Vampire and Tsurgeon

Installing CCG parsers

Selecting MED dataset

Setting HANS test dataset

Setting SICK dataset and downloading VerbOcean

Running the system on several datasets

Evaluation on FraCaS, MED, CAD, and HANS

Usage:

Example:

FraCaS section number:

Evaluation on SICK

Usage:

Example:

Ablation experiment

Usage:

Example:

Code Structure

Citation

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages

tag: `acl-srw2020`