[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

Godlovecui · 2024-05-13T07:03:48Z

System Info

Who can help?

@Tracin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python ../quantization/quantize.py --model_dir /network/model/Mixtral-8x22B-v0.1
--dtype bfloat16
--qformat fp8
--output_dir ./tllm_checkpoint_mixtral_8x22B_8gpu_fp8
--kv_cache_dtype fp8
--calib_size 8
--tp_size 8
--batch_size 8

Expected behavior

generate the successful results for quantization

actual behavior

additional notes

When I quantize the Mixtral-8x22B-v0.1into fp8 in RTX-4090, it raises below error, how to resolve it? Thank you!

Initializing model from /network/model/Mixtral-8x22B-v0.1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 59/59 [03:34<00:00, 3.64s/it]
[05/09/2024-03:29:18] Some parameters are on the meta device device because they were offloaded to the cpu.
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch .bfloat16.
Initializing tokenizer from /network/model/Mixtral-8x22B-v0.1
Loading calibration dataset
Starting quantization...
Inserted 4875 quantizers
Calibrating batch 0
Quantization done. Total time used: 103.36 s.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to tllm_checkpoint_mixtral_8x22B_8gpu_fp8/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_ll m_checkpoint
for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 220, in torch_to_tensorrt_ llm_checkpoint
build_decoder_config(layer, model_metadata_config, decoder_type, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 1180, in build_decoder_config
config.attention = build_attention_config(layer, model_metadata_config, dtype, config)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 650, in build_attention_config
config.dense = build_linear_config(layer, LINEAR_ROW, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 606, in build_linear_config
config.weight = weight.cpu()
NotImplementedError: Cannot copy out of meta tensor; no data!

The text was updated successfully, but these errors were encountered:

Godlovecui · 2024-05-13T07:25:32Z

The version of Tensorrt-llm is: [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700

nv-guomingz · 2024-05-13T08:14:16Z

I think the most possible reason is that modelopt requires loading the whole model into memory, so the 8x4090 doesn't have enough gpu memory for loading the mixtral_8x22B.

byshiue · 2024-05-15T03:16:37Z

duplicated issue as #1440. Close this one.

Godlovecui added the bug Something isn't working label May 13, 2024

byshiue closed this as completed May 15, 2024

byshiue added not a bug Some known limitation, but not a bug. and removed bug Something isn't working labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

Godlovecui commented May 13, 2024

Godlovecui commented May 13, 2024

nv-guomingz commented May 13, 2024

byshiue commented May 15, 2024

[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585

Comments

Godlovecui commented May 13, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Godlovecui commented May 13, 2024

nv-guomingz commented May 13, 2024

byshiue commented May 15, 2024