Add support for phi-3-mini-128k model #238

bil-ash · 2024-04-30T00:40:08Z

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

kevinintel · 2024-05-01T15:39:27Z

It's in our plan.
Thanks

bil-ash · 2024-05-10T02:05:34Z

Thanks, since phi 3 support has been merged will close this issue. But I have another question and do not want to create a separate issue, so asking here.

According to https://github.com/intel/neural-speed/tree/main/neural_speed/core#fastest-configuration-for-cpus , for ISAs both newer and older(AVX2) than AVX512F, int8 is the fastest configuration, but for AVX512F fp32 is the fastest. Why is it so? Also, does int8 compute lead to lesser memory usage as compared to fp32 or is the memory usage equal for same type of quantization?

luoyu-intel · 2024-05-17T02:37:58Z

@bil-ash Hi, AVX512F here means devices without AVX512_VNNI, and I don't implement u8s8 and s8s8 for AVX512. So it's better to use fp32 for computation. AVX2 devices without AVX_VNNI have u8s8 & s8s8 kernels for backup.

bil-ash · 2024-05-20T02:17:47Z

Okay, understood

kevinintel assigned intellinjun May 1, 2024

bil-ash closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for phi-3-mini-128k model #238

Add support for phi-3-mini-128k model #238

bil-ash commented Apr 30, 2024

kevinintel commented May 1, 2024

bil-ash commented May 10, 2024

luoyu-intel commented May 17, 2024

bil-ash commented May 20, 2024

Add support for phi-3-mini-128k model #238

Add support for phi-3-mini-128k model #238

Comments

bil-ash commented Apr 30, 2024

kevinintel commented May 1, 2024

bil-ash commented May 10, 2024

luoyu-intel commented May 17, 2024

bil-ash commented May 20, 2024