DagoBERT

This repository contains the code and data for the EMNLP paper DagoBERT: Generating Derivational Morphology with a Pretrained Language Model. The paper introduces DagoBERT (Derivationally and generatively optimized BERT), a BERT-based model for generating derivationally complex words.

Dependencies

The code requires Python>=3.6, numpy>=1.18, torch>=1.2, and transformers>=2.5.

Data

The data used for the experiments can be found here. As described in the paper, we split all derivatives into 7 frequency bins. Please refer to the paper for details.

Usage

To replicate the experiment on the best segmentation method, run the script test_segmentation.sh in src/model/. No training is required for this experiment since pretrained BERT is used.

To replicate the main experiment, run the script train_main.sh in src/model/. After training has finished, run the script test_main.sh in src/model/.

To replicate the experiment on the Vylomova et al. (2017) dataset, run the script train_vyl.sh in src/model/. After training has finished, run the script test_vyl.sh in src/model/.

To replicate the experiment on the impact of the input segmentation, run the script train_mwf.sh in src/model/. After training has finished, run the script test_mwf.sh in src/model/.

The scripts expect the full dataset in data/final/.

Citation

If you use the code or data in this repository, please cite the following paper:

@inproceedings{hofmann2020dagobert,
    title = {Dago{BERT}: Generating Derivational Morphology with a Pretrained Language Model},
    author = {Hofmann, Valentin and Pierrehumbert, Janet and Sch{\"u}tze, Hinrich},
    booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
    year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/external		data/external
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DagoBERT

Dependencies

Data

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

valentinhofmann/dagobert

Folders and files

Latest commit

History

Repository files navigation

DagoBERT

Dependencies

Data

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages