Skip to content

Used “aubmindlab/bert-base-arabertv2” from Aub-mind AraBERT to create a simple Arabic text tokenizer.

Notifications You must be signed in to change notification settings

OoFa99/ArabBert_Tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ArabBert_Tokenizer

  • ArabBERT_Tokenizer: Open In Colab

Goal:-

  • Writing a sample tokenizer Code and testing it, Using a provided sample code on GitHub and Google Colab.

Steps:-

  1. Installing arabert and transformers modules.
  2. Using from transformers import AutoTokenizer, AutoModel to import the tokenizer and the model builder.
  3. Using from arabert.preprocess import ArabertPreprocessor to import the text preprocessing tool.
  4. Calling the Model model_name = "aubmindlab/bert-base-arabertv2".
  5. Testing the tokenizer and the preprocessor:-
  • Tested with Different forms of Arabic text:
    • العربية الفصحى
    • الْعَرَبِيَّةِ الْفُصْحَى
      Using Shakkala.
    • Egyptian Arabic text.

About

Used “aubmindlab/bert-base-arabertv2” from Aub-mind AraBERT to create a simple Arabic text tokenizer.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published