Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load deberta-v3-large but got deberta-v2 model #132

Open
ChengsongLu opened this issue Apr 19, 2023 · 2 comments
Open

Load deberta-v3-large but got deberta-v2 model #132

ChengsongLu opened this issue Apr 19, 2023 · 2 comments

Comments

@ChengsongLu
Copy link

ChengsongLu commented Apr 19, 2023

Hi,

model_name = 'microsoft/deberta-v3-large'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

When I load the v3 model, it return a V2 model instead, how can I use the v3 model and tokenizer correctlly?

@rashmibanthia
Copy link

What you are doing is correct .. you are actually getting v3 weights , there's no DebertaV3Model on Huggingface yet.

@Nov05
Copy link

Nov05 commented Sep 8, 2023

It seems V3 is the same architecture with V2?

DebertaV2Config {
  "_name_or_path": "microsoft/deberta-v3-base",
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
...

I successfully ran the following code.

import sentencepiece
from transformers import DebertaV2Model, DebertaV2Config, DebertaV2Tokenizer
MODEL_NAME = 'microsoft/deberta-v3-base'
model = DebertaV2Model.from_pretrained(MODEL_NAME)
config = DebertaV2Config.from_pretrained(MODEL_NAME)
tokenizer = DebertaV2Tokenizer.from_pretrained(MODEL_NAME)

Output:

Downloading spm.model: 100%
2.46M/2.46M [00:00<00:00, 22.7MB/s]
Downloading (…)okenizer_config.json: 100%
52.0/52.0 [00:00<00:00, 2.33kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants