Continue quantization from history.snapshot #1778

oyazdanb · 2024-05-08T14:17:33Z

I was wondering if there is a way to resume qunatization from history.snapshot?

I am using onnx and onnxrt_cuda_ep.

I am can qunatize the model but before saving the model, the code crashes (not related to inc); is there away to continue from history.snapshot instead of running the code from the beginning.

Applying AWQ clip
Progress: [####################] 100.00%2024-05-07 14:56:05 [INFO] |Mixed Precision Statistics|
2024-05-07 14:56:05 [INFO] +------------+---------+---------------+
2024-05-07 14:56:05 [INFO] | Op Type | Total | A32W4G32 |
2024-05-07 14:56:05 [INFO] +------------+---------+---------------+
2024-05-07 14:56:05 [INFO] | MatMul | 193 | 193 |
2024-05-07 14:56:05 [INFO] +------------+---------+---------------+
2024-05-07 14:56:05 [INFO] Pass quantize model elapsed time: 6294630.87 ms
2024-05-07 14:56:05 [INFO] Save tuning history to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57./history.snapshot.
2024-05-07 14:56:05 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-07 14:56:05 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-07 14:56:05 [INFO] Save deploy yaml to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57\deploy.yaml

xiguiw · 2024-05-09T09:51:47Z

Hi @oyazdanb,

Welcome to neural-compressor~

Yes, there is some function to resume qunatization from history.snapshot.

I'll check the function and feedback to you ASAP.

xiguiw · 2024-05-10T14:09:00Z

@oyazdanb the recover is borken for some models (not for all).
Development team is working to fix it.

During the time, I show you the way to recover from history.snapshot, you can try your model to check if it works for your model.

If it does not work, you can:
1). wait for some days.
I'll notify you after it is being fixed.

install neural-compresson 2.0 and recover with 2.0.
We do not recommed to roll back to earlier version though.

Here is the way you can try to recover. Not sure it works for you model now.

     from neural_compressor.utils.utility import recover
     recover_qmodel = recover( fp32_onnx_model, "./nc_workspace/2024-05-10_19-16-32/history.snapshot", 0)

Here is the define of recover

 365 def recover(fp32_model, tuning_history_path, num, **kwargs):
 366     """Get offline recover tuned model.
 367
 368     Args:
 369         fp32_model: Input model path
 370         tuning_history_path: The tuning history path, which needs user to assign
 371         num: tune index
 372     """

xiguiw · 2024-05-11T10:08:27Z

Fix borken recover. PR:
#1788

xiguiw self-assigned this May 9, 2024

xiguiw added the help wanted Extra attention is needed label May 9, 2024

xiguiw added the aitce AI TCE to handle it firstly label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue quantization from history.snapshot #1778

Continue quantization from history.snapshot #1778

oyazdanb commented May 8, 2024

xiguiw commented May 9, 2024

xiguiw commented May 10, 2024

xiguiw commented May 11, 2024 •

edited

Continue quantization from history.snapshot #1778

Continue quantization from history.snapshot #1778

Comments

oyazdanb commented May 8, 2024

xiguiw commented May 9, 2024

xiguiw commented May 10, 2024

xiguiw commented May 11, 2024 • edited

xiguiw commented May 11, 2024 •

edited