Go to llama.cpp and download one of those folders
If you're about to use CUDA - check the version your card supports(12.2 for any RTX) and download one of those folders
Unpack everything in one folder and rename it to "LlamaCPP", put this folder in the same folder where main.py/main.exe file is
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.