TUPA is a transition-based parser for Universal Conceptual Cognitive Annotation (UCCA).
This repository contains the version of TUPA used as the submission by the HUJI team to the CoNLL 2018 UD Shared Task:
@InProceedings{hershcovich2018universal,
author = {Hershcovich, Daniel and Abend, Omri and Rappoport, Ari},
title = {Universal Dependency Parsing with a General Transition-Based DAG Parser},
booktitle = {Proc. of CoNLL UD 2018 Shared Task},
year = {2018},
url = {http://www.cs.huji.ac.il/~danielh/udst2018.pdf}
}
System outputs on development and test treebanks, as well as trained models (including ablation experiments), are available in this release.
For more information, please see the official TUPA code repository.
- Python 3.6+
Download the UD treebanks and extract them (e.g. to ../data/ud-treebanks-v2.2
).
Run train_conll2018.sh
to train a model on each treebank. For example, to train on ../data/ud-treebanks-v2.2/UD_English-EWT
, run:
./train_conll2018.sh ../data/ud-treebanks-v2.2/UD_English-EWT
Or, if you have a slurm cluster, just run
sbatch --array=1-121 train_conll2018.sh
to train models for all treebanks.
Either download the pre-trained models, or train your own (see above). If you trained your own models, update their suffixes in activate_models_conll2018.sh
.
To parse the test treebanks, run run_conll2018.sh
. For example:
./run_conll2018.sh ../data/ud-treebanks-v2.2/UD_English-EWT
Or parse all test treebanks using slurm:
sbatch --array=1-121 run_conll2018.sh
To parse the development treebanks, run:
./run_conll2018.sh ../data/ud-treebanks-v2.2/UD_English-EWT dev
Or parse all development treebanks using slurm:
sbatch --array=1-121 run_conll2018.sh dev
Either run the models yourself (see above), or download the system outputs.
To get LAS-F1 scores for test treebanks, run:
./eval_conll2018.sh
To get LAS-F1 scores for dev treebanks, run:
./eval_conll2018.sh dev
For evaluation on enhanced dependencies, run:
./eval_enhanced_conll2018.sh
- Daniel Hershcovich: danielh@cs.huji.ac.il
This package is licensed under the GPLv3 or later license (see LICENSE.txt
).