Skip to content

add my own data to the pretrained model #61

Answered by WissamAntoun
nada2017 asked this question in Q&A
Discussion options

You must be logged in to vote

For continuing Pre-training I suggest you use this code https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py. It will run masked language modeling without the next sentence prediction task. You just have to provide it with a text file with 1 sentence per line i think. It works directly with all AraBERT models (notice that for v1 and v2 you have to pre-segment the text data first)

The task is bit hard to setup, but good luck

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@AmjadNas
Comment options

@WissamAntoun
Comment options

Answer selected by nada2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants