Skip to content
This repository has been archived by the owner on Jan 5, 2023. It is now read-only.
Ozan Çağlayan edited this page May 9, 2018 · 1 revision

The models are independently implemented as single .py files and can be found under nmtpytorch/models

Adding a new model

A model implements a set of methods that can be seen in the basic NMT model. In order to implement a new model, you have two options:

  • Derive your class from the NMT class (See MNMTDecinit)
  • Copy nmt.py under a different filename and rewrite all the methods. This is suitable if your model is substantially different than the basic NMT model and there's no interest in deriving from it.

After creating your model, add the necessary import into nmtpytorch/models/__init__.py. The class name of your model is what allows nmtpy to import and use it during training and inference.

To sum up,

  1. Implement your model as a class called MyModel under nmtpytorch/models/mymodel.py
  2. Import it inside nmtpytorch/models/__init__.py as from .mymodel import MyModel
  3. Create an experiment configuration file and set model_type: MyModel inside it.

Available Models

A Conditional-GRU based NMT similar to the dl4mt-tutorial architecture.

Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International Conference on Machine Learning. 2015.

Caglayan, Ozan, Loïc Barrault, and Fethi Bougares. "Multimodal attention for neural machine translation." arXiv preprint arXiv:1609.03976 (2016).

This model uses raw image files as inputs and implements an end-to-end pipeline with CNN from torchvision.

A modification of the above model that is less memory hungry as this uses pre-extracted convolutional CNN features instead of embedding the CNN inside.

Visually initialized conditional-GRU variant from:

Caglayan, Ozan, et al. "LIUM-CVC Submissions for WMT17 Multimodal Translation Task." Proceedings of the Second Conference on Machine Translation. 2017.