Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

wxsms · 2024-05-11T09:40:10Z

System Info

CPU architecture x86_64
CPU/Host memory size 32C, 200G
GPU properties
- GPU name 4090
- GPU memory size 24G
- Clock frequencies used (if applicable)
Libraries
- TensorRT-LLM 0.10.0.dev2024043000
- Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used
- Container used (if running TensorRT-LLM in a container): container built from tensorrtllm_backend
NVIDIA driver version
OS Ubuntu
Any other information that may be useful in reproducing the bug

Who can help?

@Tracin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 ../quantization/quantize.py --model_dir starcoder2
--dtype float16
--qformat fp8
--kv_cache_dtype fp8
--output_dir xxx

Expected behavior

it should output checkpoints with no error

actual behavior

Calibrating batch 510
Calibrating batch 511
Quantization done. Total time used: 98.99 s.
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The AMMO optimized model state_dict (including the quantization factors) is saved to /tmp/checkpoint/ammo_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 332, in export_tensorrt_llm_checkpoint
    for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 278, in torch_to_tensorrt_llm_checkpoint
    tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config)
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/tensorrt_llm_utils.py", line 78, in convert_to_tensorrt_llm_config
    "architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'

additional notes

I can provide if anyother information is needed

The text was updated successfully, but these errors were encountered:

wxsms · 2024-05-22T03:40:12Z

NVIDIA/TensorRT-Model-Optimizer#9

wxsms added the bug Something isn't working label May 11, 2024

wxsms changed the title ~~Failed to quantize Starcoder2 with FP8~~ Failed to quantize Gpt (Starcoder2 variant) with FP8 May 15, 2024

wxsms closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

wxsms commented May 11, 2024

wxsms commented May 22, 2024

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

Comments

wxsms commented May 11, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

wxsms commented May 22, 2024