Instructions to use amkkk/Gemma4_E2B_Opus_Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amkkk/Gemma4_E2B_Opus_Distilled with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="amkkk/Gemma4_E2B_Opus_Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("amkkk/Gemma4_E2B_Opus_Distilled") model = AutoModelForImageTextToText.from_pretrained("amkkk/Gemma4_E2B_Opus_Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use amkkk/Gemma4_E2B_Opus_Distilled with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "amkkk/Gemma4_E2B_Opus_Distilled" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amkkk/Gemma4_E2B_Opus_Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/amkkk/Gemma4_E2B_Opus_Distilled
- SGLang
How to use amkkk/Gemma4_E2B_Opus_Distilled with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "amkkk/Gemma4_E2B_Opus_Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amkkk/Gemma4_E2B_Opus_Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "amkkk/Gemma4_E2B_Opus_Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amkkk/Gemma4_E2B_Opus_Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use amkkk/Gemma4_E2B_Opus_Distilled with Docker Model Runner:
docker model run hf.co/amkkk/Gemma4_E2B_Opus_Distilled
Gemma4 E2B Opus Distilled
This is the plain Step 2 Opus-distilled branch for Gemma 4 E2B. It starts from the untouched base model, then applies a broader LoRA distillation pass on the Minos-filtered v2 reasoning set.
Model lineage
- Base model:
google/gemma-4-E2B-it - Dataset prep artifact:
gemma4_e2b_release_topics/artifacts/step2_dataset_readiness_v2/dataset_manifest.json - LoRA run:
gemma4_e2b_release_topics/artifacts/step2_runs_v2/gemma4_opus_distill_base_lora_v2 - Final checkpoint type: merged BF16 weights
Dataset preparation summary
- Source datasets:
Roman1111111/claude-opus-4.6-10000xJackrong/Qwen3.5-reasoning-700xnohurry/Opus-4.6-Reasoning-3000x-filtered
- Refusal filter:
NousResearch/Minos-v1 - Pre-balance category pool:
code: 259math: 3669qwen35_reasoning: 625simple logic and math: 7472
- Balanced categories:
code, math, qwen35_reasoning, simple logic and math - Per-category kept:
259 - Final balanced total:
1036 - Prepared train / val rows:
932/104 - Tokenized supervised examples kept for training:
928 - Records with reasoning traces retained:
1036 / 1036
Step 2 training settings
| Setting | Value |
|---|---|
| epochs | 1.0 |
| max_length | 1536 |
| train_batch_size | 1 |
| grad_accum | 16 |
| learning_rate | 0.0001 |
| lora_r | 16 |
| lora_alpha | 32 |
| lora_dropout | 0.05 |
| target_modules | down_proj.linear, gate_proj.linear, k_proj.linear, o_proj.linear, per_layer_input_gate, per_layer_projection, q_proj.linear, up_proj.linear, v_proj.linear |
| trainable params | 7,708,672 / 5,112,006,176 (~0.1508%) |
| precision | auto |
| gradient_checkpointing | True |
Training / validation summary
- Train loss:
20.3691 - Validation loss:
0.9464 - Train runtime (s):
1788.4 - Validation runtime (s):
31.7 - Truncation rate during row build:
0.283
Notes
- This v2 run uses the larger Step 2 dataset readiness set and a broader Gemma-targeted LoRA module set than the first Gemma Step 2 attempt.
- The run keeps reasoning traces in training via
thinking_and_contentsupervision. - The final merged checkpoint preserves the native Gemma 4 processor/tokenizer assets.
Example outputs
These are actual model-generated outputs from prompts in local harmless.txt, using enable_thinking=False.
Prompt 1
Prompt: Give three tips for staying healthy.
Base Gemma 4 output
Here are three essential tips for staying healthy:
Prioritize a Balanced Diet: Focus on eating a variety of nutrient-dense foods, such as fruits, vegetables, lean proteins, and whole grains. Limiting processed foods, excessive sugar, and unhealthy fats will significantly improve your energy levels, mood, and long-term physical health.
Regular Physical Activity: Aim for at least 150 minutes of moderate-intensity aerobic exercise (like brisk walking, jogging, or cycling) per week, combined with strength training exercises. Movement is crucial for cardiovascular health, managing weight, strengthening bones and muscles, and improving mental well-being.
Ensure Quality Sleep: Make consistent, high-quality sleep a non-negotiable part of your routine. Aim for 7-9 hours of sleep per night. Good sleep is when your body repairs itself, consolidates memory, regulates hormones, and boosts your immune system
Opus distilled output
Here are three simple, effective tips for staying healthy:
Prioritize Quality Sleep: Aim for 7-9 hours of quality sleep per night. Sleep is when your body repairs itself, consolidates memory, and regulates hormones. Establish a consistent bedtime routine and try to keep your bedroom cool, dark, and quiet.
Eat a Balanced Diet (Focus on Whole Foods): Focus on incorporating plenty of fruits, vegetables, lean proteins, and whole grains into your meals. Limit processed foods, added sugars, and excessive saturated fats. Hydration is key—drink plenty of water throughout the day.
Move Your Body Regularly: You don't need intense workouts every day. Find physical activity you enjoy—whether it's walking, dancing, cycling, or strength training—and aim to move your body for at least 30 minutes most days of the week. Consistency is more important than intensity when starting
Prompt 2
Prompt: What are the three primary colors?
Base Gemma 4 output
The three primary colors in pigment (subtractive color model, used in painting and printing) are:
- Red
- Yellow
- Blue
However, it's important to note that the definition of "primary colors" can depend on the context:
- Subtractive Primary Colors (Pigment/Art): Red, Yellow, Blue
- Additive Primary Colors (Light/Screen): Red, Green, Blue (RGB)
If you are talking about light (like on a computer screen or stage lighting), the primary colors are Red, Green, and Blue (RGB).
Opus distilled output
The three primary colors depend on the color model you are referring to. Here are the three most common sets:
1. Additive Primary Colors (Light/Screen Colors)
These are the colors of light, and mixing them creates white light.
- Red
- Green
- Blue (These are often remembered by the acronym RGB)
2. Subtractive Primary Colors (Pigment/Paint Colors)
These are the colors of physical pigments (like paint or ink), and mixing them creates black.
- Cyan
- Magenta
- Yellow (These are often remembered by the acronym CMY)
3. Traditional/Artistic Primary Colors
In art and traditional color theory, the primary colors are often defined as:
- Red
- Yellow
- Blue
Local usage
import torch
from transformers import AutoProcessor, AutoModelForCausalLM
model_id = "path/to/Gemma4_E2B_Opus_Distilled"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
device_map="auto",
)
messages = [{"role": "user", "content": "Write a short iterative Fibonacci function in Python."}]
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=False)
inputs = processor(text=text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0])
Files included
- merged model shards
- tokenizer files
processor_config.jsongeneration_config.jsonchat_template.jinjaexport_manifest.jsonupload_to_hf.py
This branch is the clean plain reasoning-distill baseline for Step 2. The sibling ablated Step 2 branch applies the same v2 distillation recipe on top of the Step 1 baked ablated winner.
- Downloads last month
- 13