Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model execution is single threaded? #1663

Open
akhauriyash opened this issue Mar 12, 2024 · 1 comment
Open

Model execution is single threaded? #1663

akhauriyash opened this issue Mar 12, 2024 · 1 comment
Assignees

Comments

@akhauriyash
Copy link

Hello,

I am trying to run the following script:
https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

I use the script below:

OMP_NUM_THREADS=32 python run_clm_no_trainer.py     --model facebook/opt-1.3b    
 --quantize     --sq     --alpha 0.5     --ipex     --output_dir "saved_results"     --int8_bf16_mixed                                                              

However, on htop I see that only a single thread is being used. Even if I set torch.set_num_threads(32). It is extremely slow, making smoothquant unusable in my case.

I have a system with Intel® Xeon® Gold 5218 Processor.

Am I missing something? Thanks!

@violetch24
Copy link
Contributor

Hi @akhauriyash , I was not able to reproduce this issue on several machines yet. Could you please share your enviroment where the issue occurs using pip list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants