GitHub - fanchenyou/transformer-study: Transformer network variants tutorials

Several Transformer network variants tutorials

1. transformer_encoder, paper, src, tutorial

* Use Pytorch nn.transformer package to build an encoder for language prediction
* PyTorch 1.2 + TorchText

2 & 2.1. transformer_xl_from_scratch, src

* 2. simple toy example showing the idea of Transformer-XL which uses additional memory to encode history    
* 2.1 Build Transformer-XL + MultiAttention heads
* Show how to use previous hidden states to achieve "Recurrence Mechanism"
  - the output of the previous hidden layer of that segment
  - the output of the previous hidden layer from the previous segment
* Show how to use relative positional encoding to incorporate position information

3. transformer_xl full release, src, tutorial

* Complete implementation of Transformer-XL

4. xlnet, paper, src, tutorial

* An excellent tutorial version of XLNet from above link
* Add more comments for understanding
* Requirements: Python 3 + Pytorch v1.2 
* TODO: Add GPU support

5. Bert from scratch, paper, src, tutorial

* Build Bert - Bidirectional Transformer
* The task is two-fold, see paper section 3.1
    1) to predict the second part of a sentence (Next Sentence Prediction)
    2) to predict the masked words of a sentence (Masked LM)
* step 1: generate vocabulary file "vocab.small" in ./data
* step 2: train the network
* See transformer_bert_from_scratch_5.py for more details.

6. Bert from Pytorch Official Implementation, paper, src

* Build Bert - Bidirectional Transformer
* Utilize official Pytorch API to implement the interface of using existing code and pre-trained model
* pip install transformers tb-nightly

7. ALBERT, A Lite BERT, paper, src, tutorial

* A Lite BERT which reduces BERT params to ~20%
* Decouple word embedding size with hidden size by using two word projection matrices 
   - parameters are reduced from O(V*H) to O(V*E + E*H) s.t. E << H
* Cross-layer parameter sharing
   - the default decision for ALBERT is to share all parameters across layers (see paper section 3.1 !!)
* Sentence Order Prediction
   - NSP (Next Sentence Prediction) in BERT is not effective (as the association of two sents in a doc is not strong)
   - Inter-sentence coherence is strong: 
     the positive case is the two sentences are in proper order; 
     the negative case is the two sentences in swapped order.

Requirements

Python = 2.7 and 3.6

PyTorch = 1.2+ [here] for both python versions

GPU training with 4G+ memory, testing with 1G+ memory.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
config		config
data		data
pics		pics
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
transformer_albert_7.py		transformer_albert_7.py
transformer_bert_from_scratch_5.py		transformer_bert_from_scratch_5.py
transformer_bert_official_6.py		transformer_bert_official_6.py
transformer_encoder_1.py		transformer_encoder_1.py
transformer_xl_3.py		transformer_xl_3.py
transformer_xl_from_scratch_2.1.py		transformer_xl_from_scratch_2.1.py
transformer_xl_from_scratch_2.py		transformer_xl_from_scratch_2.py
transformer_xlnet_4.py		transformer_xlnet_4.py

fanchenyou/transformer-study

Folders and files

Latest commit

History

Repository files navigation

Several Transformer network variants tutorials

1. transformer_encoder, paper, src, tutorial

2 & 2.1. transformer_xl_from_scratch, src

3. transformer_xl full release, src, tutorial

4. xlnet, paper, src, tutorial

5. Bert from scratch, paper, src, tutorial

6. Bert from Pytorch Official Implementation, paper, src

7. ALBERT, A Lite BERT, paper, src, tutorial

Requirements

About

Resources

Stars

Watchers

Forks

Languages