Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for phi-3-mini-128k model #238

Closed
bil-ash opened this issue Apr 30, 2024 · 4 comments
Closed

Add support for phi-3-mini-128k model #238

bil-ash opened this issue Apr 30, 2024 · 4 comments
Assignees

Comments

@bil-ash
Copy link

bil-ash commented Apr 30, 2024

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

@kevinintel
Copy link
Contributor

It's in our plan.
Thanks

@bil-ash
Copy link
Author

bil-ash commented May 10, 2024

Thanks, since phi 3 support has been merged will close this issue. But I have another question and do not want to create a separate issue, so asking here.

According to https://github.com/intel/neural-speed/tree/main/neural_speed/core#fastest-configuration-for-cpus , for ISAs both newer and older(AVX2) than AVX512F, int8 is the fastest configuration, but for AVX512F fp32 is the fastest. Why is it so? Also, does int8 compute lead to lesser memory usage as compared to fp32 or is the memory usage equal for same type of quantization?

@luoyu-intel
Copy link
Contributor

@bil-ash Hi, AVX512F here means devices without AVX512_VNNI, and I don't implement u8s8 and s8s8 for AVX512. So it's better to use fp32 for computation. AVX2 devices without AVX_VNNI have u8s8 & s8s8 kernels for backup.

@bil-ash
Copy link
Author

bil-ash commented May 20, 2024

Okay, understood

@bil-ash bil-ash closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants