INT4 model shows bad perf than FP32 on Intel CPU,why?
#13
by
Sakura10151
- opened
int4 model log shows it will consume hundreds of times more time than the fp32 model on SelfAttention
int4 model log shows it will consume hundreds of times more time than the fp32 model on SelfAttention