Skip to content

PatrickSVM/Seq2Seq-with-Attention

Repository files navigation

Incorporating Attention Mechanisms in RNN-based Encoder-Decoder Models

Overview

This repo represents the codebase for my final project in the course Text-Mining at LiU.

Machine translation has been a hot topic in the field of natural language processing (NLP) for many years. In recent years, the use of neural networks has revolutionized machine translation and led to the development of more accurate and efficient translation models. One of the key innovations in this field has been the introduction of the attention mechanism, which allows the model to focus on certain parts of the input sequence when generat- ing the output sequence.

In this research project, we will investigate one of the earlier approaches to attention proposed by Bahdanau et al. (2016), which also contributed to the development of the transformer architecture. We implement an Encoder-Decoder neural network with attention for machine translation from German to English, based on the seminal work . We investigate and visualize the attention weights generated by the model for different input sentences to gain a better understanding of how the attention mechanism works in practice.

Data

For this project, a dataset that originates from the same source as in the original paper, ACL WMT ’14, is used. It is targeted towards the translation from German to English and vice versa. It includes proceedings of the European Parliament from 1996 to 2011. The dataset can be downloaded here.

All preprocessing steps can be found in the respective notebook.

Setup

The included package inside the repo can be installed by calling pip install ..

Experiments

Validation loss

For the experiments the following shared hyperparameters are used: The vocabulary size is fixed to 8,000, the embedding dimension is set to 256 for both languages, both encoder and decoder have only one GRU layer, no dropout is used and the hidden dimensions are fixed to 512 (only in the last, experiment 5, 1024 was used).

All experiments were run on a single NVIDIA GeForce RTX 3060 Ti.

The best performing model was trained with a learning rate of 1e-4, batch size 80 and a teacher forcing ratio of 0.5. The script to replicate this model can be found here. The model used to create the result is a similarly good performing one from the result of experiment 4 (check the report, loaded from a save in epoch 7, exactly step 30 according to the 25% epoch evaluation plan).

The overview over the validation losses can be found here on WandB together with the training loss.

Results

Attention weights of an example translation

The plot shows the attention weights and word alignments for the German example sentence "Natürlich sind wir mit der gegenwärtigen Situation in China nicht zufrieden" with model translation "Of course, we are not satisfied with the current situation in China.".

About

Implementation of GRU-based Encoder-Decoder Architecture with Bahdanau Attention Mechanism for Machine Translation from German to English.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published