ktoprakucar
/

granite-guardian-3.1-2b-Q8-GGUF

Text Generation

8-bit precision

Inference Endpoints

Model card Files Files and versions Community

ktoprakucar commited on Dec 26, 2024

Commit

9a3b64e

·

verified ·

1 Parent(s): c8b42d6

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -15,6 +15,9 @@ A quantized version of [Granite Guardian 3.1 2B](https://huggingface.co/ibm-gran
 Quantization is done by [llama.cpp](https://github.com/ggerganov/llama.cpp).
 ## Model Summary (from original repository)
 **Granite Guardian 3.1 2B** is a fine-tuned Granite 3.1 2B Instruct model designed to detect risks in prompts and responses.

 Quantization is done by [llama.cpp](https://github.com/ggerganov/llama.cpp).
+P.S. The llama.cpp library encountered issues during model initialization in both Python and llama-server modes, even with the quantized 8B version from other distributors. However, you can use [LM Studio](https://lmstudio.ai/) for inference!
 ## Model Summary (from original repository)
 **Granite Guardian 3.1 2B** is a fine-tuned Granite 3.1 2B Instruct model designed to detect risks in prompts and responses.