neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 Text Generation • Updated about 12 hours ago • 100k • 23
view article Article Fast Inference on Large Language Models: BLOOMZ on Habana Gaudi2 Accelerator Mar 28, 2023 • 1