Predicting Discourse Trees from Summarizers

The official code for the NAACL paper: Predicting Discourse Trees from pre-trained Transformer-based Neural Summarizers

Get Started

python==3.6.7

transformers==4.3.2

torch==1.6.0

Usage

1. Dowload the Pre-trained Summarizer

The link to the models we used in this paper: link

You can also train your own summarizer by simply use the code here by using self-attention as attention type.

2. Download the datasets

As two of the three datasets are not open source, you can find the datasets here.

a. RST-DT

b: Instructional

c: GUM

3. Pre-process the datasets

Before you can use the summarization model to parse the documents, you need first pre-process the datasets as the format shown in the example.

We use the Jupyter Notebook to Pre-process the data, and you can find all you need in Data Processing & Result Analysis.ipynb.

The processed GUM data can be found in the data folder.

4. Generate and save Ground-truth trees

Use build_gt_tree.py to generate the ground-truth constituency tree and dependency trees

5. Parsing

Constituency Parsing

Simply specify the number of layers and heads, as well as the model name and gpu to use.

python3 main_const.py -n_layers 6 -n_head 8 -device 0 -model_name cnndm-6-8

Dependency Parsing

Simply specify the number of layers and heads, as well as the model name and gpu to use.

python3 main_dep.py -n_layers 6 -n_head 8 -device 0 -model_name cnndm-6-8

Baseline

python3 main_bsl.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.DS_Store		.DS_Store
Data Processing & Result Analysis.ipynb		Data Processing & Result Analysis.ipynb
LICENSE		LICENSE
README.md		README.md
build_gt_tree.py		build_gt_tree.py
const_parsing_summ.py		const_parsing_summ.py
dataloader_dp.py		dataloader_dp.py
dep_parsing_summ.py		dep_parsing_summ.py
doc_encoder.py		doc_encoder.py
evaluation.py		evaluation.py
main_bsl.py		main_bsl.py
main_const.py		main_const.py
main_dep.py		main_dep.py
models_dp.py		models_dp.py
run_dp.py		run_dp.py

License

Wendy-Xiao/summ_guided_disco_parser

Folders and files

Latest commit

History

Repository files navigation

Predicting Discourse Trees from Summarizers

Get Started

Usage

1. Dowload the Pre-trained Summarizer

2. Download the datasets

3. Pre-process the datasets

4. Generate and save Ground-truth trees

5. Parsing

Constituency Parsing

Dependency Parsing

Baseline

Cite

About

Resources

License

Stars

Watchers

Forks

Languages