Skip to content

Latest commit

 

History

History
72 lines (54 loc) · 2.23 KB

README.md

File metadata and controls

72 lines (54 loc) · 2.23 KB

Introduction

DisCoDisCo (District of Columbia Discourse Cognoscente) is GU Corpling's submission to the DISRPT 2021 shared task. DisCoDisCo placed first among all systems submitted to the 2021 shared task across all five subtasks. Consult the official repo for more information on the shared task.

See our paper here: https://aclanthology.org/2021.disrpt-1.6/

Citation:

@inproceedings{gessler-etal-2021-discodisco,
    title = "{D}is{C}o{D}is{C}o at the {DISRPT}2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection",
    author = "Gessler, Luke  and
      Behzad, Shabnam  and
      Liu, Yang Janet  and
      Peng, Siyao  and
      Zhu, Yilun  and
      Zeldes, Amir",
    booktitle = "Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.disrpt-1.6",
    pages = "51--62"
}

Usage

Setup

  1. Create a new environment:
conda create --name disrpt python=3.8
conda activate disrpt
  1. Install dependencies:
pip install -r requirements.txt
  1. Ensure the 2021 shared task data is at data/2021/.

Experiments

Gold segmentation:

bash seg_scripts/single_corpus_train_and_test_ft.sh zho.rst.sctb

Silver segmentation:

bash seg_scripts/silver_single_corpus_train_and_test_ft.sh zho.rst.sctb

Relation classification:

bash rel_scripts/run_single_flair_clone.sh zho.rst.sctb

Troubleshooting

Batch size may be modified, if necessary, using the batch_size parameter in: