Instructions to use amkkk/Gemma4_E2B_Opus_Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use amkkk/Gemma4_E2B_Opus_Distilled with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="amkkk/Gemma4_E2B_Opus_Distilled")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("amkkk/Gemma4_E2B_Opus_Distilled")
model = AutoModelForImageTextToText.from_pretrained("amkkk/Gemma4_E2B_Opus_Distilled")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use amkkk/Gemma4_E2B_Opus_Distilled with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "amkkk/Gemma4_E2B_Opus_Distilled"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amkkk/Gemma4_E2B_Opus_Distilled",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/amkkk/Gemma4_E2B_Opus_Distilled

SGLang

How to use amkkk/Gemma4_E2B_Opus_Distilled with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "amkkk/Gemma4_E2B_Opus_Distilled" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amkkk/Gemma4_E2B_Opus_Distilled",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "amkkk/Gemma4_E2B_Opus_Distilled" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amkkk/Gemma4_E2B_Opus_Distilled",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use amkkk/Gemma4_E2B_Opus_Distilled with Docker Model Runner:
```
docker model run hf.co/amkkk/Gemma4_E2B_Opus_Distilled
```

Gemma4 E2B Opus Distilled

This is the plain Step 2 Opus-distilled branch for Gemma 4 E2B. It starts from the untouched base model, then applies a broader LoRA distillation pass on the Minos-filtered v2 reasoning set.

Model lineage

Base model: google/gemma-4-E2B-it
Dataset prep artifact: gemma4_e2b_release_topics/artifacts/step2_dataset_readiness_v2/dataset_manifest.json
LoRA run: gemma4_e2b_release_topics/artifacts/step2_runs_v2/gemma4_opus_distill_base_lora_v2
Final checkpoint type: merged BF16 weights

Dataset preparation summary

Source datasets:
- Roman1111111/claude-opus-4.6-10000x
- Jackrong/Qwen3.5-reasoning-700x
- nohurry/Opus-4.6-Reasoning-3000x-filtered
Refusal filter: NousResearch/Minos-v1
Pre-balance category pool:
- code: 259
- math: 3669
- qwen35_reasoning: 625
- simple logic and math: 7472
Balanced categories: code, math, qwen35_reasoning, simple logic and math
Per-category kept: 259
Final balanced total: 1036
Prepared train / val rows: 932 / 104
Tokenized supervised examples kept for training: 928
Records with reasoning traces retained: 1036 / 1036

Step 2 training settings

Setting	Value
epochs	`1.0`
max_length	`1536`
train_batch_size	`1`
grad_accum	`16`
learning_rate	`0.0001`
lora_r	`16`
lora_alpha	`32`
lora_dropout	`0.05`
target_modules	`down_proj.linear, gate_proj.linear, k_proj.linear, o_proj.linear, per_layer_input_gate, per_layer_projection, q_proj.linear, up_proj.linear, v_proj.linear`
trainable params	`7,708,672 / 5,112,006,176 (~0.1508%)`
precision	`auto`
gradient_checkpointing	`True`

Training / validation summary

Train loss: 20.3691
Validation loss: 0.9464
Train runtime (s): 1788.4
Validation runtime (s): 31.7
Truncation rate during row build: 0.283

Notes

This v2 run uses the larger Step 2 dataset readiness set and a broader Gemma-targeted LoRA module set than the first Gemma Step 2 attempt.
The run keeps reasoning traces in training via thinking_and_content supervision.
The final merged checkpoint preserves the native Gemma 4 processor/tokenizer assets.

Example outputs

These are actual model-generated outputs from prompts in local harmless.txt, using enable_thinking=False.

Prompt 1

Prompt: Give three tips for staying healthy.

Base Gemma 4 output

Here are three essential tips for staying healthy:

Prioritize a Balanced Diet: Focus on eating a variety of nutrient-dense foods, such as fruits, vegetables, lean proteins, and whole grains. Limiting processed foods, excessive sugar, and unhealthy fats will significantly improve your energy levels, mood, and long-term physical health.

Regular Physical Activity: Aim for at least 150 minutes of moderate-intensity aerobic exercise (like brisk walking, jogging, or cycling) per week, combined with strength training exercises. Movement is crucial for cardiovascular health, managing weight, strengthening bones and muscles, and improving mental well-being.

Ensure Quality Sleep: Make consistent, high-quality sleep a non-negotiable part of your routine. Aim for 7-9 hours of sleep per night. Good sleep is when your body repairs itself, consolidates memory, regulates hormones, and boosts your immune system

Opus distilled output

Here are three simple, effective tips for staying healthy:

Prioritize Quality Sleep: Aim for 7-9 hours of quality sleep per night. Sleep is when your body repairs itself, consolidates memory, and regulates hormones. Establish a consistent bedtime routine and try to keep your bedroom cool, dark, and quiet.

Eat a Balanced Diet (Focus on Whole Foods): Focus on incorporating plenty of fruits, vegetables, lean proteins, and whole grains into your meals. Limit processed foods, added sugars, and excessive saturated fats. Hydration is key—drink plenty of water throughout the day.

Move Your Body Regularly: You don't need intense workouts every day. Find physical activity you enjoy—whether it's walking, dancing, cycling, or strength training—and aim to move your body for at least 30 minutes most days of the week. Consistency is more important than intensity when starting

Prompt 2

Prompt: What are the three primary colors?

Base Gemma 4 output

The three primary colors in pigment (subtractive color model, used in painting and printing) are:

Red

Yellow

Blue

However, it's important to note that the definition of "primary colors" can depend on the context:

Subtractive Primary Colors (Pigment/Art): Red, Yellow, Blue

Additive Primary Colors (Light/Screen): Red, Green, Blue (RGB)

If you are talking about light (like on a computer screen or stage lighting), the primary colors are Red, Green, and Blue (RGB).

Opus distilled output

The three primary colors depend on the color model you are referring to. Here are the three most common sets:

1. Additive Primary Colors (Light/Screen Colors)

These are the colors of light, and mixing them creates white light.

Red

Green

Blue (These are often remembered by the acronym RGB)

2. Subtractive Primary Colors (Pigment/Paint Colors)

These are the colors of physical pigments (like paint or ink), and mixing them creates black.

Cyan

Magenta

Yellow (These are often remembered by the acronym CMY)

3. Traditional/Artistic Primary Colors

In art and traditional color theory, the primary colors are often defined as:

Red

Yellow

Blue

Local usage

import torch
from transformers import AutoProcessor, AutoModelForCausalLM

model_id = "path/to/Gemma4_E2B_Opus_Distilled"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a short iterative Fibonacci function in Python."}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=False)
inputs = processor(text=text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0])

Files included

merged model shards
tokenizer files
processor_config.json
generation_config.json
chat_template.jinja
export_manifest.json
upload_to_hf.py

This branch is the clean plain reasoning-distill baseline for Step 2. The sibling ablated Step 2 branch applies the same v2 distillation recipe on top of the Step 1 baked ablated winner.

Downloads last month: 13

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for amkkk/Gemma4_E2B_Opus_Distilled

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(80)

this model

Adapters

1 model