Skip to content

Latest commit

 

History

History
57 lines (45 loc) · 1.98 KB

emnlp2017_sts.md

File metadata and controls

57 lines (45 loc) · 1.98 KB

Learning semantic textual similarity with ccg2lambda

The system for determining semantic textual similarity by combining shallow features with features with features extracted from natural deduction proofs of bidirectional entailment relations between sentence pairs

Requirement

  1. In order to run this system, you need to checkout a different branch at first:
git checkout emnlp2017_sts
  1. Ensure that you have downloaded C&C parser and EasyCCG parser and wrote their installation locations in the files en/parser_location.txt.
cat en/parser_location.txt
candc:/home/usr/software/candc/candc-1.00
easyccg:/home/usr/software/easyccg
  1. You need to download some python modules, the SICK dataset by running the following script:
./en/download_dependencies.sh
pip install -r requirements.txt
  1. Also, you need to download pretrained vector space models from Here. After that, unzip the models.zip file and put this models directory into the en directory.

Evaluation with SemEval-2014 SICK dataset

You can evaluate the end-to-end system performance of a certain list of semantic templates on the test split of SICK by doing:

./en/emnlp2017exp.sh 3 en/semantic_templates_en_event_sts.yaml

Evaluation with SemEval-2012 MSR-video dataset

You can also evaluate the system performance with MSR-video dataset by doing:

./en/emnlp2017exp_msr.sh 3 en/semantic_templates_en_event_sts.yaml

Output

System output is shown below:

features_np.pickle(extracted features from ccg2lambda)
randomforestregressor.pkl(trained model)

results/evaluation.txt(correlation evaluation)
results/error_result.txt(error predictions (diff > 0.75))
results/all_result.txt(all the predictions)
results/result.png(regression line)