Skip to content

A wrapper for Huggingface's Tokenizers library , for it to be used along with existing version of Huggingface Transformers library.

Notifications You must be signed in to change notification settings

infinitylogesh/FastTokenizersWrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fast Tokenizers Wrapper

A wrapper for Huggingface's Tokenizers library , for it to be used along with existing version of Huggingface's Transformers.Tokenizers from Tokenizers library, are much faster compared to Transformers' native tokenizers. This wrapper FastTokenizers.py can be used along with existing version of transformers library.

BertTokenizerFast and DistilBertTokenizerFast are the wrappers for bert and distilBert tokenizers using tokenizers library.

Usage :

Usage is very similar to BertTokenizer and DistilBertTokenizer class in transformers library.

from FastTokenizers import DistilBertTokenizerFast,BertTokenizerFast

# Tokenizer can be initialized without a vocab file as in Transformers library.
fastDistilTokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased',
                                                               do_lower_case=True,
                                                               cache_dir=None)

fastBertTokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased',
                                                      do_lower_case=True,
                                                      cache_dir=None)

Language Model Finetuning with Fast Tokenizers Wrapper :

LM finetuning is much faster with tokenizers, run_lm_finetuning.py script is updated with FastTokenizers. Invoking and usage of the script is as same as the original script on Huggingface's Transformers

Credits

The scripts were adapted from Huggingface's Transformers library.Inspired from yet to be released, Huggingface's BertTokenizerFast.

About

A wrapper for Huggingface's Tokenizers library , for it to be used along with existing version of Huggingface Transformers library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages