ChineseBert

This is a chinese Bert model specific for question answering. We provide two models, a large model which is a 16 layer 1024 transformer, and a small model with 8 layer and 512 hidden size. Our implementation is a different from the original paper https://arxiv.org/abs/1810.04805, in which we replace a position embedding with LSTM, which shows advantages when the text length varies a lot.

Currently it is run on python3 and pytorch

#Stats:

Data: 200m chinese internet question answering pairs.

tokenizer: we use the sentencepiece tokenizer with vocab size equal to 35,000

For both large and small model, we train it for 2m steps, which did not suffer from overfit problem

large model takes 12 days for one epoch on 8-GPU NV-LINK v100. Small model takes 2 days for one epoch on 8-GPU NV-LINK v100.

#Usage:

Fed with chinese question answer pair and get the combined representations.

You can refer to the main.py for more detail.

The model has been tested under sequence length less than 1024

As the torch model file is very large, you should download it from the google drive via get_model.sh

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
resource		resource
README.md		README.md
__init__.py		__init__.py
get_model.sh		get_model.sh
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource

resource

README.md

README.md

init.py

init.py

get_model.sh

get_model.sh

main.py

main.py

model.py

model.py

requirements.txt

requirements.txt

Repository files navigation

ChineseBert

About

Releases

Packages

Languages

benywon/ChineseBert

Folders and files

Latest commit

History

Repository files navigation

ChineseBert

About

Topics

Resources

Stars

Watchers

Forks

Languages