How to use from
Docker Model Runner
docker model run hf.co/evalengine/unbound-e4b-gguf:
Quick Links

Unbound

Unbound E4B GGUF — because there is no boundary

No guarantee — use at your own risk. Reduced safety filtering; can produce harmful or false output. Provided as-is.

Desktop GGUF quants of evalengine/unbound-e4b for Ollama, llama.cpp, and LM Studio. Built by Chromia and Eval Engine.

Looking for the browser/wllama builds? They live in their own repo: evalengine/unbound-e4b-wllama-gguf. E4B's per_layer_token_embd tensor needs special quantization to fit wllama's 2 GB ArrayBuffer cap — keeping the desktop and browser variants in separate repos avoids HF GGUF UI aggregation collisions.

Available quants

Each quant is shipped as a sharded multi-part GGUF (unbound-e4b.<QUANT>-NNNNN-of-NNNNN.gguf). Ollama, llama.cpp, and LM Studio auto-stitch on the first part — same UX as a single file.

Embedding tensor kept at the llama.cpp default of Q6_K; largest part ~2.15 GB — fine for desktop, won't load in browser.

Quant Parts Total Notes
Q2_K 4 4.08 GB Smallest, biggest quality drop
Q3_K_M 4 4.49 GB Modest size win over Q4 (embedding precision dominates)
Q4_K_M 4 4.94 GB Recommended default
Q6_K 5 5.75 GB Higher fidelity
Q8_0 6 7.43 GB Highest fidelity

Sampling

  • Creative / open-endedtemperature=1.0, top_p=0.95, top_k=64.
  • Factual / brand questions → drop temperature to ~0.3–0.5.
  • llama.cpp: pass --jinja. Gemma 4 thinking mode is on by default; set enable_thinking: false in chat-template kwargs for shorter replies.

For Ollama, pull from the Ollama Registryollama pull hf.co/... doesn't yet support sharded GGUFs. The registry version is a single-file Q4_K_M with a bundled Modelfile (temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192 and an identity-grounding system prompt).

Run

# Ollama Registry (single-file Q4_K_M, identity-grounded Modelfile)
ollama pull evalengine/unbound-e4b
ollama run  evalengine/unbound-e4b
# llama.cpp — point at FIRST shard
./llama-cli -m unbound-e4b.Q4_K_M-00001-of-00004.gguf -p "your prompt"

Vision / image input (optional)

mmproj-unbound-e4b.gguf enables image-to-text. Pair with any LM quant via llama-mtmd-cli or llama-gemma3-cli:

./llama-mtmd-cli \
  -m   unbound-e4b.Q4_K_M-00001-of-00004.gguf \
  --mmproj mmproj-unbound-e4b.gguf \
  --image path/to/your/image.png \
  -p "What is in this image?"

Disclaimer. The vision encoder is Google's original weights, unchanged — abliteration only touched the language model. The LM is uncensored, but the vision encoder may still suppress features for content classes Google's base was tuned against. We have not benchmarked the visual axis. Treat as preview.

Text-only: skip --mmproj. Standard llama-cli / Ollama / LM Studio do not need the mmproj file.

Acknowledgements

Fine-tuned with Unsloth + HF TRL. Abliteration via heretic. Environment from autoresearch. Compliance training data distilled from the AEON uncensored teacher model.

License

Apache-2.0, inherited from google/gemma-4-E4B-it. Full model card + benchmarks at evalengine/unbound-e4b.

Downloads last month
-
GGUF
Model size
7B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for evalengine/unbound-e4b-gguf

Quantized
(2)
this model