BERT from scratch

Please make sure that I was not able to complete pre-training or fine-tuning the model because of my compute environment but I checked that both pre-training and fine-tuning are well performed.

Paper Reading

BERT 논문의 내용처럼 Adam(model.parameters(), lr=1e-4, betas=(0.9, 0.999), weight_decay=0.01)로 할 경우 Pre-training 진행되지 않는 현상을 확인했습니다. 아무리 학습을 시켜도 NSP에 있어서 Loss 값이 0.69 이하로 떨어지지 않았습니다. weight_decay=0.01 Term을 없애자 정상적으로 학습이 이루어지는 것을 확인했습니다.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
finetune		finetune
papers		papers
pretrain		pretrain
.gitignore		.gitignore
README.md		README.md
model.py		model.py
set_env.sh		set_env.sh
utils.py		utils.py