DavidAU
/

Llama-3.3-70B-Instruct-How-To-Run-on-Low-BPW-IQ1_S-IQ1_M-at-maximum-speed-quality

low quants settings

guide to low BPW quants

Model card Files Files and versions Community

DavidAU commited on Dec 7, 2024

Commit

7eb48f3

·

verified ·

1 Parent(s): 342da20

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ tags:
 This is a quick "down and dirty" demo, with full sampler settings (3) to augment operation of "Llama-3.3-70B-Instruct" at "IQ1_S" (ultra low bit).
-(can also apply these using IQ1_M, IQ2 quants too AND use for any 70B model at low quant levels.)
 This will allow you to load and run this model on a 16 GB video card fully, at 2048 ctx and achieve 13-15 t/s.

 This is a quick "down and dirty" demo, with full sampler settings (3) to augment operation of "Llama-3.3-70B-Instruct" at "IQ1_S" (ultra low bit).
+(can also apply these using IQ1_M, IQ2 quants too AND you can use these settings for any 70B model at low quant levels.)
 This will allow you to load and run this model on a 16 GB video card fully, at 2048 ctx and achieve 13-15 t/s.