why vocab.txt and tokenizer.json not in pretrained model in huggingface ?? #117

XuJianzhi · 2022-11-28T09:34:07Z

https://huggingface.co/microsoft/deberta-v2-xlarge/tree/main

If I run :
tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v2-xlarge')

get bug:
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

The text was updated successfully, but these errors were encountered:

XuJianzhi · 2022-11-28T13:48:16Z

how to trans spm.model to tokenizer.json ??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why vocab.txt and tokenizer.json not in pretrained model in huggingface ?? #117

why vocab.txt and tokenizer.json not in pretrained model in huggingface ?? #117

XuJianzhi commented Nov 28, 2022 •

edited

XuJianzhi commented Nov 28, 2022

why vocab.txt and tokenizer.json not in pretrained model in huggingface ?? #117

why vocab.txt and tokenizer.json not in pretrained model in huggingface ?? #117

Comments

XuJianzhi commented Nov 28, 2022 • edited

XuJianzhi commented Nov 28, 2022

XuJianzhi commented Nov 28, 2022 •

edited