[Quantization] [mixtral_8x22B] NotImplementedError: Cannot copy out of meta tensor; no data! #1585
Closed
2 of 4 tasks
Labels
not a bug
Some known limitation, but not a bug.
System Info
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python ../quantization/quantize.py --model_dir /network/model/Mixtral-8x22B-v0.1
--dtype bfloat16
--qformat fp8
--output_dir ./tllm_checkpoint_mixtral_8x22B_8gpu_fp8
--kv_cache_dtype fp8
--calib_size 8
--tp_size 8
--batch_size 8
Expected behavior
generate the successful results for quantization
actual behavior
additional notes
When I quantize the Mixtral-8x22B-v0.1into fp8 in RTX-4090, it raises below error, how to resolve it? Thank you!
Initializing model from /network/model/Mixtral-8x22B-v0.1
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 59/59 [03:34<00:00, 3.64s/it]
[05/09/2024-03:29:18] Some parameters are on the meta device device because they were offloaded to the cpu.
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch .bfloat16.
Initializing tokenizer from /network/model/Mixtral-8x22B-v0.1
Loading calibration dataset
Starting quantization...
Inserted 4875 quantizers
Calibrating batch 0
Quantization done. Total time used: 103.36 s.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to tllm_checkpoint_mixtral_8x22B_8gpu_fp8/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: Cannot copy out of meta tensor; no data!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_ll m_checkpoint
for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 220, in torch_to_tensorrt_ llm_checkpoint
build_decoder_config(layer, model_metadata_config, decoder_type, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 1180, in build_decoder_config
config.attention = build_attention_config(layer, model_metadata_config, dtype, config)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 650, in build_attention_config
config.dense = build_linear_config(layer, LINEAR_ROW, dtype)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/layer_utils.py", line 606, in build_linear_config
config.weight = weight.cpu()
NotImplementedError: Cannot copy out of meta tensor; no data!
The text was updated successfully, but these errors were encountered: