Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

Closed
2 of 4 tasks
wxsms opened this issue May 11, 2024 · 1 comment
Closed
2 of 4 tasks

Failed to quantize Gpt (Starcoder2 variant) with FP8 #1578

wxsms opened this issue May 11, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@wxsms
Copy link

wxsms commented May 11, 2024

System Info

  • CPU architecture x86_64
  • CPU/Host memory size 32C, 200G
  • GPU properties
    • GPU name 4090
    • GPU memory size 24G
    • Clock frequencies used (if applicable)
  • Libraries
    • TensorRT-LLM 0.10.0.dev2024043000
    • Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used
    • Container used (if running TensorRT-LLM in a container): container built from tensorrtllm_backend
  • NVIDIA driver version
  • OS Ubuntu
  • Any other information that may be useful in reproducing the bug

Who can help?

@Tracin

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python3 ../quantization/quantize.py --model_dir starcoder2
--dtype float16
--qformat fp8
--kv_cache_dtype fp8
--output_dir xxx

Expected behavior

it should output checkpoints with no error

actual behavior

Calibrating batch 510
Calibrating batch 511
Quantization done. Total time used: 98.99 s.
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The AMMO optimized model state_dict (including the quantization factors) is saved to /tmp/checkpoint/ammo_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 332, in export_tensorrt_llm_checkpoint
    for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/model_config_export.py", line 278, in torch_to_tensorrt_llm_checkpoint
    tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config)
  File "/usr/local/lib/python3.10/dist-packages/ammo/torch/export/tensorrt_llm_utils.py", line 78, in convert_to_tensorrt_llm_config
    "architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'

additional notes

I can provide if anyother information is needed

@wxsms wxsms added the bug Something isn't working label May 11, 2024
@wxsms wxsms changed the title Failed to quantize Starcoder2 with FP8 Failed to quantize Gpt (Starcoder2 variant) with FP8 May 15, 2024
@wxsms
Copy link
Author

wxsms commented May 22, 2024

@wxsms wxsms closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant