Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

Closed
2 of 4 tasks
Godlovecui opened this issue May 13, 2024 · 3 comments
Closed
2 of 4 tasks
Labels
not a bug Some known limitation, but not a bug.

Comments

@Godlovecui
Copy link

System Info

image

Who can help?

@Tracin

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python ../quantization/quantize.py --model_dir /network/model/Mixtral-8x22B-v0.1
--dtype bfloat16
--qformat fp8
--output_dir ./tllm_checkpoint_mixtral_8x22B_8gpu_fp8
--kv_cache_dtype fp8
--calib_size 8
--tp_size 8
--batch_size 8

Expected behavior

generate the successful results for quantization

actual behavior

image

additional notes

When I quantize the Mixtral-8x22B-v0.1into fp8 in RTX-4090, it raises below error, how to resolve it? Thank you!

Initializing model from /network/model/Mixtral-8x22B-v0.1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 59/59 [03:34<00:00, 3.64s/it]
[05/09/2024-03:29:18] Some parameters are on the meta device device because they were offloaded to the cpu.
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch .bfloat16.
Initializing tokenizer from /network/model/Mixtral-8x22B-v0.1
Loading calibration dataset
Starting quantization...
Inserted 4875 quantizers
Calibrating batch 0
Quantization done. Total time used: 103.36 s.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to tllm_checkpoint_mixtral_8x22B_8gpu_fp8/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_ll m_checkpoint
for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 220, in torch_to_tensorrt_ llm_checkpoint
build_decoder_config(layer, model_metadata_config, decoder_type, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 1180, in build_decoder_config
config.attention = build_attention_config(layer, model_metadata_config, dtype, config)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 650, in build_attention_config
config.dense = build_linear_config(layer, LINEAR_ROW, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 606, in build_linear_config
config.weight = weight.cpu()
NotImplementedError: Cannot copy out of meta tensor; no data!

@Godlovecui Godlovecui added the bug Something isn't working label May 13, 2024
@Godlovecui
Copy link
Author

The version of Tensorrt-llm is: [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700

@nv-guomingz
Copy link
Collaborator

I think the most possible reason is that modelopt requires loading the whole model into memory, so the 8x4090 doesn't have enough gpu memory for loading the mixtral_8x22B.

@byshiue
Copy link
Collaborator

byshiue commented May 15, 2024

duplicated issue as #1440. Close this one.

@byshiue byshiue closed this as completed May 15, 2024
@byshiue byshiue added not a bug Some known limitation, but not a bug. and removed bug Something isn't working labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Some known limitation, but not a bug.
Projects
None yet
Development

No branches or pull requests

3 participants