GitHub - moon23k/Transformer_Balance: Transformer Balance Research

Transformer Balance

To address Natural Language Generation tasks using Transformer models, both Encoder and Decoder components are essential. Typically, a well-balanced structure between the Encoder and Decoder yields good performance. However, depending on the nature of the task, there are cases where either the Encoding or Decoding aspect becomes more critical.

To investigate the direct impact of emphasizing specific aspects on performance, this repository explores six combinations in terms of Width and Depth for models in three natural language generation tasks: Translation, Dialogue Generation, and Text Summarization. These combinations include models with a balanced emphasis on both Encoder and Decoder, models with a stronger focus on Encoder, and models with a stronger focus on Decoder.

Model Architecture

In this project, we use a Transformer structure that builds upon the standard Transformer architecture proposed in "Attention is All You Need" by adding linear layers before and after the Encoder and Decoder. This improvement allows the Transformer model to handle input and output values of different dimensions flexibly. All other aspects of the structure remain the same as the standard Transformer.

To measure the performance of each model, we conduct experiments using a baseline model and six variant models. We categorize these models based on their focus between the Encoder and Decoder and the model's width and depth. Detailed information for each model can be found in the table below.

Model Name	Balance	Type	Note
Equal Default Model	Equal	Default	Base Line Model, Not weighted to one side, well balanced.
Equal Wide Model	Equal	Wide	Not weighted to one side, but has double hidden dimension size both on Encoder and Decoder
Equal Deep Model	Equal	Deep	Not weighted to one side, but has double Layer Numbers both on Encoder and Decoder
Encoder Wide Model	Encoder	Wide	Encoder Weighted Model, with doubled hidden dimension size only on Encoder
Encoder Deep Model	Encoder	Deep	Encoder Weighted Model, with doubled Layer Numbers only on Encoder
Decoder Wide Model	Decoder	Wide	Decoder Weighted Model, with doubled hidden dimension size only on Decoder
Decoder Deep Model	Decoder	Deep	Decoder Weighted Model, with doubled Layer Numbers only on Decoder

Result

Model	Machine Translation	Dialogue Generation	Text Summarization
Equal Default Model	11.23	29.65	-
Equal Wide Model	0.00	0.00	-
Equal Deep Model	0.00	0.00	-
Encoder Wide Model	0.00	14.67	-
Encoder Deep Model	0.00	21.02	-
Decoder Wide Model	0.00	0.00	-
Decoder Deep Model	0.00	0.00	-

Reference

Attention is All You Need

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
ckpt		ckpt
data		data
model		model
module		module
README.md		README.md
config.yaml		config.yaml
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ckpt

ckpt

data

data

model

model

module

module

README.md

README.md

config.yaml

config.yaml

run.py

run.py

setup.py

setup.py

Repository files navigation

Transformer Balance

Model Architecture

Result

Reference

About

Releases

Packages

Languages

moon23k/Transformer_Balance

Folders and files

Latest commit

History

Repository files navigation

Transformer Balance

Model Architecture

Result

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages