Neural Networks for Data Selection

This repository contains the code for the paper "Neural Networks Classifier for Data Selection in Statistical Machine Translation"

Built upon our fork of Keras (version 1.2) and tested for the Theano backend.

Features

Neural network-based sentence classifiers, either at monolingual and bilingual level.
BLSTMs / CNNs classifiers. Easy to extend.
Support for including Glove or Word2Vec pretrained word vectors (binary or text formats).
Iterative semi-supervised selection from top/bottom scoring sentences from an out-of-domain corpus.

Installation

Provided that you have pip installed, run:

git clone https://github.com/lvapeab/sentence-selectioNN
cd sentence-selectioNN
pip install -r requirements.txt

for obtaining the required packages for running this library.

sentence-selectioNN requires the following libraries:

Instructions:

Assuming you have a corpus:

Check out the inputs/outputs of your model in data_engine/prepare_data.py
If you want to use pretrained word vectors, use the preprocessing scripts for binary or text for pretrained Glove or Word2Vec vectors.
Set a model configuration in config.py
Train!:

python main.py

Architecture

We support two different network architecture, BLSTM or CNN, both at monolingual or bilingual level.

Please, see the paper for a more detailed description of the model.

Citation

If you use this code for any purpose, please cite the following paper:

Peris Á., Chinea-Rios M., Casacuberta F. 
Neural Networks Classifier for Data Selection in Statistical Machine Translation. 
In  The Prague Bulletin of Mathematical Linguistics No. 108, pp. 283–294. 2017.

Contact

Álvaro Peris (web page): lvapeab@prhlt.upv.es

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.idea		.idea
data_engine		data_engine
docs		docs
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.py		config.py
main.py		main.py
model_zoo.py		model_zoo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

data_engine

data_engine

docs

docs

utils

utils

.gitignore

.gitignore

README.md

README.md

init.py

init.py

config.py

config.py

main.py

main.py

model_zoo.py

model_zoo.py

requirements.txt

requirements.txt

Repository files navigation

Neural Networks for Data Selection

Features

Installation

Instructions:

Architecture

Citation

Contact

About

Releases

Packages

Languages

lvapeab/sentence-selectioNN

Folders and files

Latest commit

History

Repository files navigation

Neural Networks for Data Selection

Features

Installation

Instructions:

Architecture

Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages