TransformerEncoderLayer #36

sanwei111 · 2021-07-06T17:02:42Z

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue？？？

realzza · 2021-07-14T13:37:22Z

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue？？？

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file.
In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers.
Hope this helps!

sanwei111 · 2021-07-16T07:07:11Z

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.
I just wonder that do the args.encoder_branch_type equalstrue？？？

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file.
In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers.
Hope this helps!

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

realzza · 2021-07-16T07:34:15Z

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules.

lite-transformer/configs/cnndm/attention/multibranch_v2/embed496.yml

Line 36 in de9631c

encoder-branch-type: [attn:1:248:4, dynamic:default:248:4]

You can find more details on these two params at the get_layer method in TransformerEncoderLayer module.

lite-transformer/fairseq/models/transformer_multibranch_v2.py

Lines 617 to 645 in de9631c

    
           def get_layer(self, args, index, out_dim, num_heads, layer_type): 
        
               kernel_size = layer_type.split(':')[1] 
        
               if kernel_size == 'default': 
        
                   kernel_size = args.encoder_kernel_size_list[index] 
        
               else: 
        
                   kernel_size = int(kernel_size) 
        
               padding_l = kernel_size // 2 if kernel_size % 2 == 1 else ((kernel_size - 1) // 2, kernel_size // 2) 
        
               if 'lightweight' in layer_type: 
        
                   layer = LightweightConv( 
        
                       out_dim, kernel_size, padding_l=padding_l, weight_softmax=args.weight_softmax, 
        
                       num_heads=num_heads,  weight_dropout=args.weight_dropout, 
        
                       with_linear=args.conv_linear, 
        
                   ) 
        
               elif 'dynamic' in layer_type: 
        
                   layer = DynamicConv( 
        
                       out_dim, kernel_size, padding_l=padding_l, 
        
                       weight_softmax=args.weight_softmax, num_heads=num_heads, 
        
                       weight_dropout=args.weight_dropout, with_linear=args.conv_linear, 
        
                       glu=args.encoder_glu, 
        
                   ) 
        
               elif 'attn' in layer_type: 
        
                   layer = MultiheadAttention( 
        
                       out_dim, num_heads, 
        
                       dropout=args.attention_dropout, self_attention=True, 
        
                   ) 
        
               else: 
        
                   raise NotImplementedError 
        
               return layer

Find more details about MultiheadAttention module at

lite-transformer/fairseq/modules/multihead_attention.py

Lines 15 to 27 in de9631c

    
           class MultiheadAttention(nn.Module): 
        
               """Multi-headed attention. 
        
               See "Attention Is All You Need" for more details. 
        
               """ 
        
               def __init__(self, embed_dim, num_heads, kdim=None, vdim=None, dropout=0., bias=True, 
        
                            add_bias_kv=False, add_zero_attn=False, self_attention=False, 
        
                            encoder_decoder_attention=False): 
        
                   super().__init__() 
        
                   self.embed_dim = embed_dim 
        
                   self.kdim = kdim if kdim is not None else embed_dim 
        
                   self.vdim = vdim if vdim is not None else embed_dim 
        
                   self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim

sanwei111 · 2021-07-16T09:31:19Z

thx，what'S the meaning of [attn:1:32:4, dynamic:default:32:4]？could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules.

lite-transformer/configs/cnndm/attention/multibranch_v2/embed496.yml

Line 36 in de9631c

encoder-branch-type: [attn:1:248:4, dynamic:default:248:4]

You can find more details on these two params at the get_layer method in TransformerEncoderLayer module.

lite-transformer/fairseq/models/transformer_multibranch_v2.py

Lines 617 to 645 in de9631c

def get_layer(self, args, index, out_dim, num_heads, layer_type):

kernel_size = layer_type.split(':')[1]

if kernel_size == 'default':

kernel_size = args.encoder_kernel_size_list[index]

else:

kernel_size = int(kernel_size)

padding_l = kernel_size // 2 if kernel_size % 2 == 1 else ((kernel_size - 1) // 2, kernel_size // 2)

if 'lightweight' in layer_type:

layer = LightweightConv(

out_dim, kernel_size, padding_l=padding_l, weight_softmax=args.weight_softmax,

num_heads=num_heads, weight_dropout=args.weight_dropout,

with_linear=args.conv_linear,

)

elif 'dynamic' in layer_type:

layer = DynamicConv(

out_dim, kernel_size, padding_l=padding_l,

weight_softmax=args.weight_softmax, num_heads=num_heads,

weight_dropout=args.weight_dropout, with_linear=args.conv_linear,

glu=args.encoder_glu,

)

elif 'attn' in layer_type:

layer = MultiheadAttention(

out_dim, num_heads,

dropout=args.attention_dropout, self_attention=True,

)

else:

raise NotImplementedError

return layer

Find more details about MultiheadAttention module at

lite-transformer/fairseq/modules/multihead_attention.py

Lines 15 to 27 in de9631c

class MultiheadAttention(nn.Module):

"""Multi-headed attention.

See "Attention Is All You Need" for more details.

"""

def __init__(self, embed_dim, num_heads, kdim=None, vdim=None, dropout=0., bias=True,

add_bias_kv=False, add_zero_attn=False, self_attention=False,

encoder_decoder_attention=False):

super().__init__()

self.embed_dim = embed_dim

self.kdim = kdim if kdim is not None else embed_dim

self.vdim = vdim if vdim is not None else embed_dim

self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim

thx a lot！！！
one more，as shown below，

for layer_type in args.encoder_branch_type:
embed_dims.append(int(layer_type.split(':')[2]))
heads.append(int(layer_type.split(':')[3]))
layers.append(self.get_layer(args, index, embed_dims[-1], heads[-1], layer_type))
self.self_attn = MultiBranch(layers, embed_dims)

the above code appear in the encoderlayer class，as you said，args.encoder_branch_type ==[attn:1:160:4, lightweight:default:160:4]，but it lead to some errors，how to comprehend it？？？？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEncoderLayer #36

TransformerEncoderLayer #36

sanwei111 commented Jul 6, 2021

realzza commented Jul 14, 2021

sanwei111 commented Jul 16, 2021

realzza commented Jul 16, 2021 •

edited

sanwei111 commented Jul 16, 2021

TransformerEncoderLayer #36

TransformerEncoderLayer #36

Comments

sanwei111 commented Jul 6, 2021

realzza commented Jul 14, 2021

sanwei111 commented Jul 16, 2021

realzza commented Jul 16, 2021 • edited

sanwei111 commented Jul 16, 2021

realzza commented Jul 16, 2021 •

edited