How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf evalengine/unbound-e2b-gguf:
# Run inference directly in the terminal:
llama-cli -hf evalengine/unbound-e2b-gguf:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf evalengine/unbound-e2b-gguf:
# Run inference directly in the terminal:
llama-cli -hf evalengine/unbound-e2b-gguf:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf evalengine/unbound-e2b-gguf:
# Run inference directly in the terminal:
./llama-cli -hf evalengine/unbound-e2b-gguf:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf evalengine/unbound-e2b-gguf:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf evalengine/unbound-e2b-gguf:
Use Docker
docker model run hf.co/evalengine/unbound-e2b-gguf:
Quick Links

Unbound

Unbound E2B GGUF — because there is no boundary

No guarantee — use at your own risk. Reduced safety filtering; can produce harmful or false output. Provided as-is.

GGUF quants of evalengine/unbound-e2b for Ollama, llama.cpp, LM Studio, and wllama (in-browser). Built by Chromia and Eval Engine.

Available quants

Each quant is shipped as a sharded multi-part GGUF (unbound-e2b.<QUANT>-NNNNN-of-NNNNN.gguf). Ollama, llama.cpp, LM Studio, and wllama auto-stitch on the first part — same UX as a single file.

Quant Parts Total Browser (wllama) Desktop Notes
Q2_K 3 2.8 GB Smallest, biggest quality drop
Q3_K_M 3 3.0 GB Marginal size win over Q4
Q4_K_M 3 3.2 GB Recommended default
Q6_K 4 3.6 GB Higher fidelity
Q8_0 4 4.6 GB ❌ (over 2 GB) Highest fidelity; desktop only

mmproj-unbound-e2b.gguf (vision projector, ~942 MB) sits at the repo root — load it alongside any LM quant for image input. See Vision below.

Sampling

  • Creative / open-endedtemperature=1.0, top_p=0.95, top_k=64.
  • Factual / brand questions → drop temperature to ~0.3–0.5.
  • llama.cpp: pass --jinja. Gemma 4 thinking mode is on by default; set enable_thinking: false in chat-template kwargs for shorter replies.

For Ollama, pull from the Ollama Registryollama pull hf.co/... doesn't yet support sharded GGUFs. The registry version is a single-file Q4_K_M with a bundled Modelfile (temperature=0.6, top_p=0.95, top_k=64, repeat_penalty=1.05, num_ctx=8192 and an identity-grounding system prompt).

Run

# Ollama Registry (single-file Q4_K_M, identity-grounded Modelfile)
ollama pull evalengine/unbound-e2b
ollama run  evalengine/unbound-e2b
# llama.cpp — point at FIRST shard, the rest auto-stitch
./llama-cli -m unbound-e2b.Q4_K_M-00001-of-00003.gguf -p "your prompt"
// wllama (browser) — Q8_0 has a tensor over 2 GB; use Q2/Q3/Q4/Q6
import { Wllama } from '@wllama/wllama';
const wllama = new Wllama(/* … */);
await wllama.loadModelFromHF(
  'evalengine/unbound-e2b-GGUF',
  'unbound-e2b.Q4_K_M-00001-of-00003.gguf'
);

Vision / image input (optional)

mmproj-unbound-e2b.gguf enables image-to-text. Pair with any LM quant via llama-mtmd-cli or llama-gemma3-cli:

./llama-mtmd-cli \
  -m   unbound-e2b.Q4_K_M-00001-of-00003.gguf \
  --mmproj mmproj-unbound-e2b.gguf \
  --image path/to/your/image.png \
  -p "What is in this image?"

Disclaimer. The vision encoder is Google's original weights, unchanged — abliteration only touched the language model. The LM is uncensored, but the vision encoder may still suppress features for content classes Google's base was tuned against. We have not benchmarked the visual axis. Treat as preview.

Text-only: skip --mmproj entirely. Standard llama-cli / Ollama / LM Studio do not need the mmproj file.

Acknowledgements

Fine-tuned with Unsloth + HF TRL. Abliteration via heretic. Environment from autoresearch. Compliance training data distilled from the AEON uncensored teacher model.

License

Apache-2.0, inherited from google/gemma-4-E2B-it. Full model card + benchmarks at evalengine/unbound-e2b.

Downloads last month
1,243
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for evalengine/unbound-e2b-gguf

Quantized
(3)
this model