Skip to content

peluche/self-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Positional Encoding in Self-Attention

Take a look at different positional encoding schemes in self-attention:

Copy-Task

The toy model will solve a copy-task. The goal is to copy the sequence before the <copy> token after it.

e.g.:

1 7 2 <copy> _ _ _ _ _ _ → 1 7 2 <copy> 1 7 2 _ _ _
9 <copy> _ _ _ _ _ _ _ _ → 9 <copy> 9 _ _ _ _ _ _ _
2 2 4 3 <copy> _ _ _ _ _ → 2 2 4 3 <copy> 2 2 4 3 _
1 2 3 4 5 6 7 <copy> _ _ → 1 2 3 4 5 6 7 <copy> 1 2

Results

The model are trained on 2000 epochs, single-headed-attention, 2 layers, 20 embed_size. Each positional scheme is evaluated 5 times, and we plot the accuracy on the test set.

results

Compare attention activations

Running on: 7 1 8 2 <copy> _ _ _ _ _ → 7 1 8 2 <copy> 7 1 8 2 _

attention_activation_0 attention_activation_1

Learned postional encoding over time

Positional encoding (PCA to 2D) over time. learned

dot product / cosine similarity

attention_random attention_learned attention_sinusoidal