Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half #27

Open
tczbzb opened this issue Oct 20, 2023 · 7 comments
Open

Comments

@tczbzb
Copy link

tczbzb commented Oct 20, 2023

直接load 32-bit的 Llama-2-7b-chat-hf model:
model = AutoModelForCausalLM.from_pretrained(
model_path
)
会有以下错误:

Executing ROME algorithm for the update: [A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?] -> [IV fluids and furosemide]
Computing left vector (u)...
Selected u projection object lung
Left vector shape: torch.Size([11008])
Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss 3.252 = 3.252 + 0.0 avg prob of [IV fluids and furosemide] 0.0395
loss 2.999 = 2.996 + 0.003 avg prob of [IV fluids and furosemide] 0.0508
loss 2.518 = 2.51 + 0.009 avg prob of [IV fluids and furosemide] 0.0823
loss 2.148 = 2.056 + 0.092 avg prob of [IV fluids and furosemide] 0.1295
loss 1.609 = 1.539 + 0.07 avg prob of [IV fluids and furosemide] 0.2176
loss 1.005 = 0.935 + 0.07 avg prob of [IV fluids and furosemide] 0.395
loss 0.443 = 0.349 + 0.094 avg prob of [IV fluids and furosemide] 0.7071
loss 0.168 = 0.09 + 0.079 avg prob of [IV fluids and furosemide] 0.9143
loss 0.059 = 0.025 + 0.034 avg prob of [IV fluids and furosemide] 0.9755
loss 0.055 = 0.019 + 0.036 avg prob of [IV fluids and furosemide] 0.9812
loss 0.042 = 0.008 + 0.035 avg prob of [IV fluids and furosemide] 0.9923
loss 0.037 = 0.005 + 0.032 avg prob of [IV fluids and furosemide] 0.9954
loss 0.035 = 0.004 + 0.031 avg prob of [IV fluids and furosemide] 0.9957
loss 0.032 = 0.004 + 0.028 avg prob of [IV fluids and furosemide] 0.9963
loss 0.029 = 0.003 + 0.026 avg prob of [IV fluids and furosemide] 0.9969
loss 0.026 = 0.003 + 0.023 avg prob of [IV fluids and furosemide] 0.9973
loss 0.023 = 0.002 + 0.02 avg prob of [IV fluids and furosemide] 0.9976
loss 0.02 = 0.002 + 0.018 avg prob of [IV fluids and furosemide] 0.9979
loss 0.019 = 0.002 + 0.017 avg prob of [IV fluids and furosemide] 0.998
loss 0.017 = 0.002 + 0.015 avg prob of [IV fluids and furosemide] 0.9982
Delta norm: 17.499
Change in target norm: 4.375 to 18.048 => 13.673
Division Factor: 3.688
Right vector norm: 4.746
Right vector shape: torch.Size([4096])

Traceback (most recent call last):
File "/data/a/zhangbo/CAP_medical_LLM/evaluate_model_with_multiple_datasets.py", line 300, in
edit_model(global_model, global_tokenizer, list_of_dicts, 'llama-7b')
File "/data/a/zhangbo/CAP_medical_LLM/edit_util.py", line 50, in edit_model
model_new, _ = apply_rome_to_model(
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 56, in apply_rome_to_model
deltas = execute_rome(model, tokenizer, request, hparams, batch_first)
File "/data/a/zhangbo/CAP_medical_LLM/FastEdit/fastedit/rome/rome_main.py", line 134, in execute_rome
upd_matrix = left_vector.unsqueeze(1) @ right_vector.unsqueeze(0)
RuntimeError: expected scalar type Float but found Half

======

如果load 16-bit的model:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
).bfloat16()

也会有类似的错误:
RuntimeError: expected scalar type BFloat16 but found Half

@tczbzb tczbzb changed the title [llama-7b] RuntimeError: expected scalar type Float but found Half [Llama-2-7b-chat] RuntimeError: expected scalar type Float but found Half Oct 20, 2023
@hiyouga
Copy link
Owner

hiyouga commented Oct 20, 2023

model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
)
不要用 bf16

@tczbzb
Copy link
Author

tczbzb commented Oct 20, 2023

不用bf16的话,llama2会报这个错误: meta-llama/llama#380

不过就算我直接load 32-bit的model,也会出现上面写的错误:

model = AutoModelForCausalLM.from_pretrained(model_path)

RuntimeError: expected scalar type Float but found Half

@hiyouga
Copy link
Owner

hiyouga commented Oct 20, 2023

LLaMA2 的溢出问题确实没解决,之后的版本会修复该问题,目前无法直接使用

@tczbzb
Copy link
Author

tczbzb commented Oct 20, 2023

明白。目前我能否自己改rome_main.py里对应的报错行,把Half强行转化成Float来跳过这个错误?还是说这样改之后还会有别的问题?

@hiyouga
Copy link
Owner

hiyouga commented Oct 20, 2023

最好等待我们修复

@tczbzb
Copy link
Author

tczbzb commented Oct 21, 2023

多谢多谢!再加个信息, 如果是没有用 .bfloat16(),比如以下:

model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
)

那么虽然执行会通过,但是里面的probability就都是nan了,然后inference时候就会出错。

Computing right vector (v)
Lookup index found: -37 | Sentence: A patient diagnosed with carcinoma of lung presented with a serum calcium level of 16.4 mmol/L. What will be the first step in management?IV fluids and furosemide | Token: lung
Rewrite layer is 5
Tying optimization objective to 31
Recording initial value of v*
loss nan = nan + 0.0 avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
loss nan = nan + nan avg prob of [IV fluids and furosemide] nan
Delta norm: nan
Change in target norm: 4.391 to nan => nan
Division Factor: 3.689
Right vector norm: nan
Right vector shape: torch.Size([4096])
Deltas successfully computed for ['model.layers.5.mlp.down_proj.weight']
Time elapsed: 12.56 seconds
New weights successfully inserted into ['model.layers.5.mlp.down_proj.weight']

RuntimeError: probability tensor contains either inf, nan or element < 0

@hiyouga
Copy link
Owner

hiyouga commented Oct 22, 2023

忘记说了,不采用别的数据类型,直接使用 tokenizer.pad_token = tokenizer.unk_token 也可以避免上述问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants