Skip to content


Repository files navigation

Neural Machine Translation in PyTorch

Ken Sible | NLP Group | University of Notre Dame

Note, any option in config.toml can also be passed as a command line argument,

$ python --model --beam-size 10 "Guten Tag!"

and any output from stdout can be diverted using the output redirection operator.

$ python --model --file > output.en

Train Model

usage: [-h] --lang LANG LANG --data FILE --test FILE --vocab FILE --codes FILE --model FILE --config FILE --log FILE [--seed SEED] [--tqdm]

  -h, --help        show this help message and exit
  --lang LANG LANG  source/target language
  --data FILE       training data
  --test FILE       validation data
  --vocab FILE      vocab file (shared)
  --codes FILE      codes file (shared)
  --model FILE      model file (.pt)
  --config FILE     config file (.toml)
  --log FILE        log file (.log)
  --seed SEED       random seed
  --tqdm            import tqdm

Score Model

usage: [-h] --data FILE --model FILE [--tqdm]

  -h, --help    show this help message and exit
  --data FILE   testing data
  --model FILE  model file (.pt)
  --tqdm        import tqdm

Translate Input

usage: [-h] --model FILE (--string STRING | --file FILE)

  -h, --help       show this help message and exit
  --model FILE     model file (.pt)
  --string STRING  input string
  --file FILE      input file

Model Configuration (Default)

embed_dim           = 512   # dimensions of embedding sublayers
ff_dim              = 2048  # dimensions of feed-forward sublayers
num_heads           = 8     # number of parallel attention heads
dropout             = 0.1   # dropout for emb/ff/attn sublayers
num_layers          = 6     # number of encoder/decoder layers
max_epochs          = 250   # maximum number of epochs, halt training
lr                  = 3e-4  # learning rate (step size of the optimizer)
patience            = 3     # number of epochs tolerated w/o improvement
decay_factor        = 0.8   # if patience reached, lr *= decay_factor
min_lr              = 5e-5  # minimum learning rate, halt training
label_smoothing     = 0.1   # label smoothing (regularization technique)
clip_grad           = 1.0   # maximum allowed value of gradients
batch_size          = 4096  # number of tokens per batch (source/target)
max_length          = 512   # maximum sentence length (during training)
beam_size           = 4     # beam search decoding (length normalization)