Incorporating Attention Mechanisms in RNN-based Encoder-Decoder Models

Overview

This repo represents the codebase for my final project in the course Text-Mining at LiU.

Machine translation has been a hot topic in the field of natural language processing (NLP) for many years. In recent years, the use of neural networks has revolutionized machine translation and led to the development of more accurate and efficient translation models. One of the key innovations in this field has been the introduction of the attention mechanism, which allows the model to focus on certain parts of the input sequence when generat- ing the output sequence.

In this research project, we will investigate one of the earlier approaches to attention proposed by Bahdanau et al. (2016), which also contributed to the development of the transformer architecture. We implement an Encoder-Decoder neural network with attention for machine translation from German to English, based on the seminal work . We investigate and visualize the attention weights generated by the model for different input sentences to gain a better understanding of how the attention mechanism works in practice.

Data

For this project, a dataset that originates from the same source as in the original paper, ACL WMT ’14, is used. It is targeted towards the translation from German to English and vice versa. It includes proceedings of the European Parliament from 1996 to 2011. The dataset can be downloaded here.

All preprocessing steps can be found in the respective notebook.

Setup

The included package inside the repo can be installed by calling pip install ..

Experiments

Validation loss

For the experiments the following shared hyperparameters are used: The vocabulary size is fixed to 8,000, the embedding dimension is set to 256 for both languages, both encoder and decoder have only one GRU layer, no dropout is used and the hidden dimensions are fixed to 512 (only in the last, experiment 5, 1024 was used).

All experiments were run on a single NVIDIA GeForce RTX 3060 Ti.

The best performing model was trained with a learning rate of 1e-4, batch size 80 and a teacher forcing ratio of 0.5. The script to replicate this model can be found here. The model used to create the result is a similarly good performing one from the result of experiment 4 (check the report, loaded from a save in epoch 7, exactly step 30 according to the 25% epoch evaluation plan).

The overview over the validation losses can be found here on WandB together with the training loss.

Results

Attention weights of an example translation

The plot shows the attention weights and word alignments for the German example sentence "Natürlich sind wir mit der gegenwärtigen Situation in China nicht zufrieden" with model translation "Of course, we are not satisfied with the current situation in China.".

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data/translation		data/translation
experiments		experiments
report		report
seq2seq_attention		seq2seq_attention
.gitignore		.gitignore
Explanations.ipynb		Explanations.ipynb
README.md		README.md
Seq2Seq_env.yml		Seq2Seq_env.yml
output.txt		output.txt
run_py_nohup.sh		run_py_nohup.sh
setup.py		setup.py
train_seq2seq.py		train_seq2seq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/translation

data/translation

experiments

experiments

report

report

seq2seq_attention

seq2seq_attention

.gitignore

.gitignore

Explanations.ipynb

Explanations.ipynb

README.md

README.md

Seq2Seq_env.yml

Seq2Seq_env.yml

output.txt

output.txt

run_py_nohup.sh

run_py_nohup.sh

setup.py

setup.py

train_seq2seq.py

train_seq2seq.py

Repository files navigation

Incorporating Attention Mechanisms in RNN-based Encoder-Decoder Models

Overview

Data

Setup

Experiments

Results

About

Releases

Packages

Languages

PatrickSVM/Seq2Seq-with-Attention

Folders and files

Latest commit

History

Repository files navigation

Incorporating Attention Mechanisms in RNN-based Encoder-Decoder Models

Overview

Data

Setup

Experiments

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages