Code for our EMNLP 2021 (findings) paper Multilingual AMR Parsing with Noisy Knowledge Distillation.
We develop one multilingual AMR parser to parse five different languages including German (DE), Spanish (ES), Italian (IS), Chinese (ZH), and English (EN).
The parsing performance of our best parser (Google Drive) is shown below.
Language | DE | ES | IT | ZH | EN |
---|---|---|---|---|---|
Smatch (%) | 73.1 | 76.2 | 75.8 | 62.2 | 84.2 |
(some numbers are higher than those reported in our paper due to different runs)
The code has been tested on Python 3.6. All dependencies are listed in requirements.txt.
The pretrained model can be downloaded from Google Drive.
See scripts/work.sh
for evaluation.
To parse your own data, use the following command:
checkpoint=ckpt/best.pt # this points to the pretrained model you have downloaded
dataset=tmp.txt # this points to the data you want to parse (see detailed explanation below)
PYTHONPATH=. python3 bin/predict_amrs.py \
--model facebook/mbart-large-50-many-to-many-mmt \
--checkpoint ${checkpoint} \
--dataset ${dataset} \
--nproc-per-node 4 \
--gold-path tmp-gold.txt \
--pred-path tmp-pred.txt \
--beam-size 4 \
--batch-size 5000 \
--penman-linearization \
--use-pointer-tokens
the tmp.txt
file looks like below:
# ::id 0
# ::snt Resolutely support the thread starter! I compose a poem in reply:
# ::snt_lang en
(z0 / and)
# ::id 1
# ::snt Ich unterstütze denjenigen, der diesen Thread gestartet hat, ganz deutlich! Ich habe ein Gedicht als Antwort verfasst:
# ::snt_lang de
(z0 / and)
# ::id 2
# ::snt ¡Respalde firmemente el inicio del hilo! Escribo un poema en respuesta:
# ::snt_lang es
(z0 / and)
# ::id 3
# ::snt Sostenete assolutamente chi ha avviato questo thread! Scrivo una poesia come risposta:
# ::snt_lang it
(z0 / and)
# ::id 4
# ::snt 坚决支持楼主!我赋诗一首,以表寸心:
# ::snt_lang zh
(z0 / and)
As seen, each block has four fields:
- id: the sentence id
- snt: the input sentence
- snt_lang: the langauge of the input sentence (choosing from en, de, es, it, zh, ...)
- (z0/ and): this is just a placeholder
follow the instructions in the readme in the xlamr
folder.
use train.f3.sh
and train.f4.sh
in the scripts
folder.
(loading the KD data can be very time-consuming, use prepare.sh
to cache the data)
This project is based on SPRING and xl-amr.
For any questions, please drop an email to Deng Cai.