Unable to save llama2 after SmoothQuant #1600

dellamuradario · 2024-02-02T11:03:36Z

Hi all,

I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static]

System configuration:
OS : WINDOWS 11
Python: Python 3.10.11

My steps:

CREATE PROJECT FOLDERr: neural-compressor-tutorial
CREATE VIRTUAL ENV: python -m venv neural-compressor-env
DOWNLOAD:d the folder of the guide
RUN: pip install neural-compressor and SKIP_RUNTIME=True pip install -r requirements.txt (successful))
RUN: python prepare_model.py --input_model="meta-llama/Llama-2-7b-chat-hf" --output_model="./llama-2-7b-chat-hf" (successful)
RUN WITH GIT BASH TERMINAL: bash run_quant.sh --input_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/llama-2-7b-chat-hf --output_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model

TERMINAL LOG - ERROR:

2024-02-02 11:28:30.1017397 [E:onnxruntime:, inference_session.cc:1935 onnxruntime::InferenceSession::Initialize::<lambda_5a23845ba810e30de3b9e7b450415bf5>::operator ()] Exception during initialization: bad allocation 2024-02-02 11:28:30 [ERROR] Unexpected exception RuntimeException('[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation') happened during tuning. Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\quantization.py", line 234, in fit strategy.traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse super().traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 483, in traverse self._setup_pre_tuning_algo_scheduler() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 361, in _setup_pre_tuning_algo_scheduler self.model = self._pre_tuning_algo_scheduler("pre_quantization") File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\algorithm.py", line 127, in __call__ self._q_model = algo(self._origin_model, self._q_model, self._adaptor, self._dataloader, self._calib_iter) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\smooth_quant.py", line 89, in __call__ q_model = adaptor.smooth_quant( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 228, in smooth_quant self.smooth_quant_model = self.sq.transform(**self.cur_sq_args) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 183, in transform self._dump_op_info(percentile, op_types, calib_iter, quantize_config) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 395, in _dump_op_info self.max_vals_per_channel, self.shape_info, self.tensors_to_node = augment.calib_smooth( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 774, in calib_smooth _, output_dicts = self.get_intermediate_outputs() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 254, in get_intermediate_outputs else onnxruntime.InferenceSession(self.model_wrapper.model_path + "_augment.onnx", so, providers=[backend]) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation 2024-02-02 11:28:36 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit. model: decoder_model.onnx args.output_model: C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\main.py", line 336, in <module> q_model.save(os.path.join(args.output_model, model)) AttributeError: 'NoneType' object has no attribute 'save'

What could be the solution? Did I miss any crucial steps during the installation or while executing the commands listed above?

Thank you for any suggestions.

The text was updated successfully, but these errors were encountered:

yuwenzho · 2024-02-19T06:56:30Z

RUNTIME_EXCEPTION : Exception during initialization: bad allocation is raised when create InferenceSession. The exception seems to be a memory allocation issue. You can try to track your memory consumption.

chensuyue assigned yuwenzho Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to save llama2 after SmoothQuant #1600

Unable to save llama2 after SmoothQuant #1600

dellamuradario commented Feb 2, 2024

yuwenzho commented Feb 19, 2024

Unable to save llama2 after SmoothQuant #1600

Unable to save llama2 after SmoothQuant #1600

Comments

dellamuradario commented Feb 2, 2024

yuwenzho commented Feb 19, 2024