RNNs and syntactic variability

This repository contains the code for the creation of the synthetic langauges used in the paper "Studying the inductive biases of RNNs with synthetic variations of natural languages" (accepted paper in NAACL 2019).

Dataset Creation

The dataset_creation directory contains the code for creating verb-argument agreement datasets for synthetic versions of English.

The arguments specified in main.py allow controlling for various parameters, such as with which arguments the verb agrees, whether NPs are marked for nuclear cases, which case system to use, and what would be the verb-subject-object order. For example, the input sentence "they say the broker took them out for lunch frequently", when converted to OVS word order, yields the sentence "them took out frequently the broker for lunch say they".

For an explanation on all command line arguments, run python main.py -h from the dataset_creation directory.

Running main.py would generate a dataset file that contains instances of modified sentences exhibiting the desired grammatical phenomena, alongside the agreement patterns in them. This dataset would be saved in a datasets directory.

Model

The model repository contains the code for running the model used in the paper. Running instructions:

create train, dev and test agreement datasets.
Run the script collect_vocab.py from within the model directory.
Run main.pyto train the model.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
datasets		datasets
datasets_creation		datasets_creation
model		model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

datasets

datasets

datasets_creation

datasets_creation

model

model

README.md

README.md

Repository files navigation

RNNs and syntactic variability

Dataset Creation

Model

About

Releases

Packages

Languages

shauli-ravfogel/rnn_typology

Folders and files

Latest commit

History

Repository files navigation

RNNs and syntactic variability

Dataset Creation

Model

About

Resources

Stars

Watchers

Forks

Languages