GitHub - Asteur/Frame-semantic-SegRNN

Frame-semantic parser for automatically detecting FrameNet frames and their frame-elements from sentences. The model is based on softmax-margin segmental recurrent neural nets, described in our paper Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. An example of a frame-semantic parse is shown below

Installation

This project is developed using Python 2.7. Other requirements include the DyNet library, and some NLTK packages.

pip install dynet
pip install nltk
python -m nltk.downloader averaged_perceptron_tagger wordnet

Data Preprocessing

This codebase only handles data in the XML format specified under FrameNet. However, we first reformat the data for ease of readability.

First, create a data/ directory here, download FrameNet version 1.x and place it under data/fndata-1.x/. Also create a directory data/neural/fn1.x/ to convert to CoNLL 2009 format.
Convert the data into a format similar to CoNLL 2009, but with BIO tags, by executing:

cd src/
python preprocess.py 2> err

The above script writes the train, dev and test files in the required format into the data/neural/fn1.x/ directory. There is plenty of noise in the annotations. The annotations which could not be used, along with the error messages, gets spit out to the standard error.

[Optional, but highly recommended] If you want to use pretrained GloVe word embeddings, download and extract them under data/. Run the preprocessing with an extra argument for the intended GloVe file.

python preprocess.py glove.6B.100d.txt 2> err

This trims the GloVe files to the FrameNet vocabulary, to ease memory requirements. For example, the above creates data/glove.6B.100d.framevocab.txt to be used by our models.

Target Identification

A bidirectional LSTM model takes into account the lexical unit index in FrameNet to identify targets. This model is not described in the paper.

Training

To train the target identification module, execute:

cd src/
python targetid.py

This saves the best model on validation data in the directory src/tmp/, which will be pointed to by the symbolic link src/model.targetid.1.x. Pre-trained models coming soon.

Test

To test under the best model in src/model.targetid.1.x, execute:

python targetid.py --mode test

Frame Identification

Frame identification is based on a bidirectional LSTM model.

Training

To train the frame identification module, execute:

cd src/
python frameid.py

This saves the best model on validation data in the directory src/tmp/, which will be pointed to by the symbolic link src/model.frameid.1.x. Pre-trained models coming soon.

Test

To test under the best model in src/model.frameid.1.x, execute:

python frameid.py --mode test > frameid.log

frameid.log will contain example-wise analysis. The output, in CoNLL 2009 format will be written to predicted.1.x.frameid.test.out and in the frame-elements file format to my.predict.test.frame.elements.

Frame-Element (Argument) Identification

Argument identification is based on a segmental recurrent neural net model, used as a baseline in our paper.

Training

To train an argument identifier, execute:

cd src/
python segrnn-argid.py 2> err

This saves the best model on validation data in the directory src/tmp/, which will be pointed to by the symbolic link src/model.segrnn-argid.1.x. Pre-trained models coming soon.

Test

To test under the best model in src/model.segrnn-argid.1.x, execute:

python segrnn-argid.py --mode test > argid.log

Contact and Reference

For questions and usage issues, please contact swabha@cs.cmu.edu. If you use open-sesame for research, please cite our paper as follows:

@article{swayamdipta:17,
  title={{Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold}},
  author={Swabha Swayamdipta and Sam Thomson and Chris Dyer and Noah A. Smith},
  journal={arXiv preprint arXiv:1706.09528},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
fig		fig
src		src
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fig

fig

src

src

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

Installation

Data Preprocessing

Target Identification

Training

Test

Frame Identification

Training

Test

Frame-Element (Argument) Identification

Training

Test

Contact and Reference

About

Releases

Packages

Contributors 2

Languages

Asteur/Frame-semantic-SegRNN

Folders and files

Latest commit

History

Repository files navigation

Installation

Data Preprocessing

Target Identification

Training

Test

Frame Identification

Training

Test

Frame-Element (Argument) Identification

Training

Test

Contact and Reference

About

Resources

Stars

Watchers

Forks

Languages