Skip to content

moon23k/Transformer_Balance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Balance

  To address Natural Language Generation tasks using Transformer models, both Encoder and Decoder components are essential. Typically, a well-balanced structure between the Encoder and Decoder yields good performance. However, depending on the nature of the task, there are cases where either the Encoding or Decoding aspect becomes more critical.

To investigate the direct impact of emphasizing specific aspects on performance, this repository explores six combinations in terms of Width and Depth for models in three natural language generation tasks: Translation, Dialogue Generation, and Text Summarization. These combinations include models with a balanced emphasis on both Encoder and Decoder, models with a stronger focus on Encoder, and models with a stronger focus on Decoder.



Model Architecture

  In this project, we use a Transformer structure that builds upon the standard Transformer architecture proposed in "Attention is All You Need" by adding linear layers before and after the Encoder and Decoder. This improvement allows the Transformer model to handle input and output values of different dimensions flexibly. All other aspects of the structure remain the same as the standard Transformer.

To measure the performance of each model, we conduct experiments using a baseline model and six variant models. We categorize these models based on their focus between the Encoder and Decoder and the model's width and depth. Detailed information for each model can be found in the table below.


Model Name Balance Type Note
Equal Default Model Equal Default Base Line Model, Not weighted to one side, well balanced.
Equal Wide Model Equal Wide Not weighted to one side, but has double hidden dimension size both on Encoder and Decoder
Equal Deep Model Equal Deep Not weighted to one side, but has double Layer Numbers both on Encoder and Decoder
Encoder Wide Model Encoder Wide Encoder Weighted Model, with doubled hidden dimension size only on Encoder
Encoder Deep Model Encoder Deep Encoder Weighted Model, with doubled Layer Numbers only on Encoder
Decoder Wide Model Decoder Wide Decoder Weighted Model, with doubled hidden dimension size only on Decoder
Decoder Deep Model Decoder Deep Decoder Weighted Model, with doubled Layer Numbers only on Decoder



Result

Model Machine Translation Dialogue Generation Text Summarization
Equal Default Model 11.23 29.65 -
Equal Wide Model 0.00 0.00 -
Equal Deep Model 0.00 0.00 -
Encoder Wide Model 0.00 14.67 -
Encoder Deep Model 0.00 21.02 -
Decoder Wide Model 0.00 0.00 -
Decoder Deep Model 0.00 0.00 -



Reference