amr-eager

AMR-EAGER [1] is a transition-based parser for Abstract Meaning Representation (http://amr.isi.edu/). This repository provides an extension of AMR-EAGER to English, Italian, Spanish, German and Chinese. See [2] for a detailed explanation and experiments.

Installation

Make sure you have Java 8
Install Torch and torch packages dp, nngraph and optim (using luarocks, as explained here: http://torch.ch/docs/getting-started.html)
Install the following python dependencies: numpy and pytorch (https://github.com/hughperkins/pytorch)
Run ./download.sh
For Spanish parsing, install FreeLing (tested 3.0 and 4.0) and set path in preprocessing_es.sh (https://github.com/TALP-UPC/FreeLing/releases)
Install JAMR aligner (https://github.com/jflanigan/jamr) and set path in preprocessing.sh
For Chinese, you wull also need to install the mafan package (https://pypi.org/project/mafan/).

Run the parser with pretrained model

Note: the input file must contain sentences (one sentence for line), see contrib/sample-sentences.txt for example. All following commands should be run from the parser root directory.

Preprocessing:

./preprocessing.sh -s -l [en|it|de|es|zh] -f <sentence_file>

If not specified, the default language is English. You should get the output files in the same directory as the input files, with the prefix <sentences_file> and extensions .out and .sentences.

python preprocessing.py -l [en|it|de|es|zh] -f <sentences_file>

You should get the output files in the same directory as the input files, with the prefix <sentences_file> and extensions .tokens.p, .dependencies.p.

Parsing:

python parser.py -l [en|it|de|es|zh] -f <file> -m <model_dir>

If you wish to have the list of all nodes and edges in a JAMR-like format, add option -n. Without -m the parser uses the model provided in the directory ENGLISH. For Spanish, you need to specify the model SPANISH, for Italian ITALIAN, for German GERMAN and for Chinese CHINESE.

Mac users: the pretrained models seem to have compatibility errors when running on Mac OS X.

Evaluation

We provide evaluation metrics to compare AMR graphs based on Smatch (http://amr.isi.edu/evaluation.html). The script computes a set of metrics between AMR graphs in addition to the traditional Smatch code:

Unlabeled: Smatch score computed on the predicted graphs after removing all edge labels
No WSD. Smatch score while ignoring Propbank senses (e.g., duck-01 vs duck-02)
Named Ent. F-score on the named entity recognition (:name roles)
Wikification. F-score on the wikification (:wiki roles)
Negations. F-score on the negation detection (:polarity roles)
Concepts. F-score on the concept identification task
Reentrancy. Smatch computed on reentrant edges only
SRL. Smatch computed on :ARG-i roles only

The different metrics are detailed and explained in [1], which also uses them to evaluate several AMR parsers. (Some of the metrics were recently fixed and updated)

cd amrevaluation
./evaluation.sh <file>.parsed <gold_amr_file>

To use the evaluation script with a different parser, provide the other parser's output as the first argument.

Annotation projection

In [2] we describe an annotation projection method for AMR, through which AMR data for English can be projected to other languages. The projected data we used for the experiments can be obtained by running download-data.sh.

This process is prone to errors in the middle, so we do not provide an end-to-end script to generate projected data given a new parallel corpus. In order to achieve this, the following steps must be carried out:

Install fast_align (https://github.com/clab/fast_align)
Changing the FASTALIGN path variable in fastalign_train.sh accordingly
Runnning fastalign_train.sh (see comments for instructions) to train word alignment models for the language pair
Preprocessing and parsing the English side of the parallel corpus with AMREager (see parsing instruction)
Creating a target file with sentences in the target languages and the parsed AMR (with same format as traditional AMR data).
Train the new model as explained in the next section (the preprocessing step will use the output generated by fastalign_train.sh of step 3 to project the alignments).

Train a model

Preprocess training and validation sets:

./preprocessing.sh -f <amr_file> -l [en|it|de|es|zh]
python preprocessing.py --amrs -f <amr_file> -l [en|it|de|es|zh]

Run the oracle to generate the training data:

python collect.py -t <training_file> -m <model_dir> -l [en|it|de|es|zh]
python create_dataset.py -t <training_file> -v <validation_file> -m <model_dir> -l [en|it|de|es|zh]

Train the three neural networks:

th nnets/actions.lua --model_dir <model_dir>
th nnets/labels.lua --model_dir <model_dir>
th nnets/reentrancies.lua --model_dir <model_dir>

(use also --cuda if you want to use GPUs).

Finally, move the .dat models generated by Torch in <model_dir>/actions.dat, <model_dir>/labels.dat and <model_dir>/reentrancies.dat.
To evaluate the performance of the neural networks run
```
th nnets/report.lua <model_dir>
```
Note: If you used GPUs to train the models,you will need to uncomment the line require cunn from nnets/classify.lua.

Open-source code used:

Smatch: http://amr.isi.edu/evaluation.html
Tokenizer: https://github.com/redpony/cdec
CoreNLP: http://stanfordnlp.github.io/CoreNLP/
Tint: http://tint.fbk.eu
FreeLing: https://github.com/TALP-UPC/FreeLing/releases

References

[1] "An Incremental Parser for Abstract Meaning Representation", Marco Damonte, Shay B. Cohen and Giorgio Satta. Proceedings of EACL (2017). URL: https://arxiv.org/abs/1608.06111 [2] "Cross Lingual Abstract Meaning Representation", Marco Damonte and Shay B. Cohen. To appear in Proceedings of NAACL (2018). URL: TODO

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
amrevaluation		amrevaluation
contrib		contrib
nnets		nnets
nnets_de		nnets_de
nnets_en		nnets_en
nnets_es		nnets_es
nnets_it		nnets_it
nnets_zh		nnets_zh
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
action.py		action.py
add_de.py		add_de.py
add_de.sh		add_de.sh
add_es.py		add_es.py
add_es.sh		add_es.sh
add_it.py		add_it.py
add_it.sh		add_it.sh
add_zh.py		add_zh.py
add_zh.sh		add_zh.sh
alignments.py		alignments.py
amrdata_de.py		amrdata_de.py
amrdata_en.py		amrdata_en.py
amrdata_es.py		amrdata_es.py
amrdata_it.py		amrdata_it.py
amrdata_zh.py		amrdata_zh.py
app.py		app.py
buf.py		buf.py
buftoken.py		buftoken.py
chinese.properties		chinese.properties
collect.py		collect.py
combine.py		combine.py
config.py		config.py
corenlp.properties		corenlp.properties
corenlp.sh		corenlp.sh
corenlp_de.sh		corenlp_de.sh
corenlp_zh.sh		corenlp_zh.sh
create_dataset.py		create_dataset.py
dependencies.py		dependencies.py
download-data.sh		download-data.sh
download-data.sh-e		download-data.sh-e
download.sh		download.sh
download.sh-e		download.sh-e
embs.py		embs.py
es.cfg		es.cfg
fastalign_train.sh		fastalign_train.sh
forms.py		forms.py
freelingnlp.sh		freelingnlp.sh
fullstops.py		fullstops.py
german.properties		german.properties
history.py		history.py
hooks_de.py		hooks_de.py
hooks_en.py		hooks_en.py
hooks_es.py		hooks_es.py
hooks_it.py		hooks_it.py
hooks_zh.py		hooks_zh.py
launch.sh		launch.sh
node.py		node.py
oracle.py		oracle.py
parse_sent.py		parse_sent.py
parser.py		parser.py
preprocessing.py		preprocessing.py
preprocessing.sh		preprocessing.sh
preprocessing_de.sh		preprocessing_de.sh
preprocessing_en.sh		preprocessing_en.sh
preprocessing_es.sh		preprocessing_es.sh
preprocessing_it.sh		preprocessing_it.sh
preprocessing_zh.sh		preprocessing_zh.sh
projectivity.py		projectivity.py
pt_statistics.py		pt_statistics.py
relations.py		relations.py
resources.py		resources.py
rules.py		rules.py
script_nmt.sh		script_nmt.sh
spanish.properties		spanish.properties
stack.py		stack.py
state.py		state.py
subgraph.py		subgraph.py
test.out		test.out
test_de		test_de
test_de.dependencies.p		test_de.dependencies.p
test_de.out		test_de.out
test_de.parsed		test_de.parsed
test_de.sentences		test_de.sentences
test_de.tokens.p		test_de.tokens.p
test_es		test_es
test_es.dependencies.p		test_es.dependencies.p
test_es.out		test_es.out
test_es.parsed		test_es.parsed
test_es.sentences		test_es.sentences
test_es.tokens.p		test_es.tokens.p
test_it		test_it
test_it.dependencies.p		test_it.dependencies.p
test_it.out		test_it.out
test_it.parsed		test_it.parsed
test_it.sentences		test_it.sentences
test_it.tokens.p		test_it.tokens.p
tint.properties		tint.properties
tintnlp.sh		tintnlp.sh
tostring.py		tostring.py

License

mdtux89/amr-eager-multilingual

Folders and files

Latest commit

History