Skip to content

andreamad8/Universal-Transformer-Pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Universal-Transformer-Pytorch

Simple and self-contained implementation of the Universal Transformer (Dehghani, 2018) in Pytorch. Please open issues if you find bugs, and send pull request if you want to contribuite.

GIF taken from: https://twitter.com/OriolVinyalsML/status/1017523208059260929

Universal Transformer

The basic Transformer model has been taken from https://github.com/kolloldas/torchnlp. For now it has been implemented:

  • Universal Transformer Encoder Decoder, with position and time embeddings.
  • Adaptive Computation Time (Graves, 2016) as describe in Universal Transformer paper.
  • Universal Transformer for bAbI data.

Dependendency

python3
pytorch 0.4
torchtext
argparse

How to run

To run standard Universal Transformer on bAbI run:

python main.py --task 1

To run Adaptive Computation Time:

python main.py --task 1 --act

Results

10k over 10 run, get the maximum.

In task 16 17 18 19 I notice that are very hard to converge also in training set. The problem seams to be the lr rate scheduling. Moreover, on 1K setting the results are very bad yet, maybe I have to tune some hyper-parameters.

Task Uni-Trs + ACT Original
1 0.0 0.0 0.0
2 0.0 0.2 0.0
3 0.8 2.4 0.4
4 0.0 0.0 0.0
5 0.4 0.1 0.0
6 0.0 0.0 0.0
7 0.4 0.0 0.0
8 0.2 0.1 0.0
9 0.0 0.0 0.0
10 0.0 0.0 0.0
11 0.0 0.0 0.0
12 0.0 0.0 0.0
13 0.0 0.0 0.0
14 0.0 0.0 0.0
15 0.0 0.0 0.0
16 50.5 50.6 0.4
17 13.7 14.1 0.6
18 4 6.9 0.0
19 79.2 65.2 2.8
20 0.0 0.0 0.0
--- --- --- ---
avg 7.46 6.98 0.21
fail 3 3 0

TODO

  • Visualize ACT on different tasks

About

Implementation of Universal Transformer in Pytorch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages