Absurdly slow on a I7 - 3060 - 64GB
Ran this on a 3060 12 GB with 64GB of DDR 4 RAM.
It's incredibly slow and I was wondering if there were any settings I could adjust to remedy this?
Ah, are you trying to run the full fp16 model? This is the unquantised repo, it's not really meant for basic inference. It'd take nearly 24gb of vram to run this one.
I'd run this instead with 12gb of vram:
https://huggingface.co/Sao10K/Fimbulvetr-11B-v2-GGUF -- GGUF (You can full offload on GPU or set max layers at q5_k_m, at 6k+ context... or go q6/q8 partially loading some of it in RAM)
https://huggingface.co/LoneStriker/Fimbulvetr-11B-v2-5.0bpw-h6-exl2 -- exl2 --> Fastest Speed, only pure GPU offloading
I'd take a look at koboldcpp for GGUF, it's literally an .exe and easy to run, or TabbyAPI for exl2
Thank you for your help!