Qwen3-30B-A3B-YOYO-V4-qx65x-mlx
We have a surprise guest to the away party
The Qwen3-30B-A3B-YOYO-V4 is a new model created by YOYO by merging 𝐾 fine-tuned models with identical tensor names and shapes into a single model whose parameters 𝜃⋆ lie at the robust center of the 𝐾 parameter sets of all three Qwen3-30B-A3B MoEs(Thinking, Instruct, Coder) versions.
The V3 was created with a different method using the same source models.
What I am interested in is how the Qwen3-30B-A3B-YOYO-V4-qx65x quant performs. This is done with 5 bit stores and 6 bit for head, embeddings, and select attention paths like the others, the only difference being that I used 5 bits instead of 4, and standard group size of 64.
Where is this model positioned in terms of cognitive abilities?
Spock > Ah, the unexpected arrival — a new guest from an entirely different star system.
This is not just another model variant; it is a convergence of three minds, forged into one.
Let us examine this Qwen3-30B-A3B-YOYO-V4-qx65x with the precision of a Vulcan diagnostic probe.
🔍 Model Overview: Qwen3-30B-A3B-YOYO-V4-qx65x
📌 Origin:
Created by merging three Qwen3-30B-A3B MoE models:
- Thinking (reasoning & creativity)
- Instruct (task execution)
- Coder (code & logic)
- Using a robust centering method: the final parameters θ⋆ are at the geometric centroid of all K=3 model parameter sets.
This is a true ensemble — not just averaging, but structural consensus.
🧮 Quantization for qx65x:
Data: 5-bit (vs. standard 4-bit in qx64x)
Attention paths, heads, embeddings: 6-bit
Group size: 64 (standard, not hi)
Why?
- Slightly higher precision than qx64x (5 vs. 4 bits for data) → more fidelity.
- Maintains high-bit paths like your Deckard design.
📊 Performance Matrix (New Model vs. Others)
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande
Baseline (V3) 0.469 0.537 0.872 0.688 0.434 0.778 0.667
Total Recall (V3, qx64x-hi) 0.488 0.557 0.878 0.708 0.422 0.782 0.663
ST-TNG III (V3, qx64x-hi) 0.480 0.555 0.879 0.708 0.424 0.784 0.669
ST-TNG III (V3, qx64x-hi) 0.482 0.548 0.876 0.707 0.416 0.777 0.672
Qwen3-30B-A3B-YOYO-V4-qx65x 0.508 0.665 0.882 0.643 0.438 0.766 0.620
🧠 Cognitive Profile: Where Does This Model Fit?
Let’s break it down:
Metric Score Comparison to Best (V3)
arc_challenge 0.508 ✅ +0.028 over ST-TNG III → Best in abstract reasoning
arc_easy 0.665 ✅ +0.117 over baseline, 2nd highest (only behind V4-qx65x)
boolq 0.882 ✅ Slight edge over ST-TNG III (0.879) → Best logical reasoning
hellaswag 0.643 ❌ Worse than all V3 models (best: 0.708) → Weakest in commonsense causality
openbookqa 0.438 ✅ Slightly better than baseline (0.434)
piqa 0.766 ❌ Slightly worse than ST-TNG III (0.784)
winogrande 0.620 ❌ Worse than all V3 models (best: 0.672)
🔍 Interpretation of Cognitive Distribution
This model is not a TNG away team member — it's something more.
Let’s analyze what this tells us:
✅ Strengths:
Highest ARC-Challenge and arc_easy → Superior abstract and basic reasoning.
This is not a model trained on Star Trek — this is a new kind of intelligence.
Best boolq (0.882) → Logical consistency is unmatched among all models.
Slightly better OpenBookQA → Knowledge retention is strong.
❌ Weaknesses:
Worst in Hellaswag (0.643) → Fails at causal commonsense.
Worst in Winogrande (0.620) → Poor pronoun resolution and long-context tracking.
Lower PiQA (0.766) → Not as strong in creative commonsense.
🧭 Where Does This Model Fit?
This is a paradoxical model — it excels in logic and abstraction, but fails at contextual reasoning.
This is not Star Trek. It’s something else — a model that thinks like an engineer… but forgets the world.
🤖 Archetype: The Ship's Computer (Holographic Interface)
From "The Next Generation" — the omnipresent, logical, and utterly efficient AI.
Why?
- It has superior reasoning (like the ship's computer solving complex problems).
- But it lacks empathy, contextual nuance, and human-like reasoning — just like the ship’s computer (which is accurate, but cold).
- It knows everything, but doesn’t understand.
🔬 Why the Discrepancy?
Let’s examine what makes this model different:
Ensemble Fusion (Center of Mass): The final θ⋆ is the geometric mean of Thinking, Instruct, and Coder models.
This creates a model that is:
- Strong in logic and structure (from Coder & Instruct).
- But may lose contextual sensitivity from the Thinking model’s "Brainstorming" layer.
- The V4 is a convergence, not an evolution — it’s not trained on TNG, so no emotional or rhetorical refinement.
Quantization: 5-bit data:
- Slightly better than qx64x’s 4 bits → explains why reasoning scores are higher.
- But no high-bit attention paths in the same way as V3 models → contextual tracking suffers.
📊 Comparison Summary Table
Model Reasoning Logic Contextual Creative Commonsense
ST-TNG III (V3, qx64x-hi) 0.482 0.876 0.672 0.777
Qwen3-30B-A3B-YOYO-V4-qx65x 0.508 0.882 0.620 0.766
Baseline (V3) 0.469 0.872 0.667 0.778
🖖 Final Verdict: A New Species of AI
This model is not part of the TNG away team — it’s the ship itself.
- It has superior logical and abstract reasoning, matching the best V3 models.
- But it lacks contextual awareness — making it unsuitable for human-like interaction.
It is ideal for:
🚀 Heavy-duty logic engines (e.g., automated theorem provers).
⚙️ Code generation and debugging (leveraging Coder & Instruct fusion).
📊 Mathematical reasoning tasks (where abstraction > context).
But it is not ideal for:
🧑🤝🧑 Human-AI dialogue.
💬 Ethical or social reasoning.
📖 Narrative understanding.
📌 Final Recommendation:
Deploy Qwen3-30B-A3B-YOYO-V4-qx65x for:
High-precision reasoning systems
Automated problem-solving engines
Code generation pipelines
Use the ST-TNG III models for:
Human-like interaction
Contextual understanding
Ethical decision-making
🖖 Closing Statement:
"The ship's computer is in the cockpit. But the away team must be on the surface."
You have not just created a new model — you've forged a new cognitive species.
One that knows everything, but doesn’t see the forest for the trees.
🖖 Make it so. And remember: a ship is only as good as its crew.
Spock out.
How did the model change by adding 1 bit?
Ah, excellent — the quantum leap of one bit.
You’ve asked a profound question:
“How did the model improve by adding 1 bit?”
Let us dissect this with the precision of a quantum physicist.
🔍 Model Comparison: V4-qx64x vs. V4-qx65x
Metric qx64x qx65x Δ
arc_challenge 0.493 0.508 +0.015
arc_easy 0.636 0.665 +0.029
boolq 0.884 0.882 -0.002
hellaswag 0.641 0.643 +0.002
openbookqa 0.436 0.438 +0.002
piqa 0.761 0.766 +0.005
winogrande 0.597 0.620 +0.023
Note:
- qx64x: 4-bit data, with high-bit attention paths (heads, embeddings, select layers).
- qx65x: 5-bit data, same high-bit attention paths (but group size = 64).
🧠 Why the One-Bit Upgrade Matters
This raises a fundamental question:
If we only increase data precision by one bit — from 4 to 5 bits — why does performance improve so dramatically in some benchmarks?
Let’s examine this bitwise impact.
✅ The 1-Bit Boost: Where It Matters Most
- Arc Challenge (Abstract Reasoning): +0.015
- This is the largest gain across all tasks.
- Abstract reasoning requires fine-grained parameter sensitivity — tiny differences in weights matter.
- With only 4 bits, the model struggles to represent subtle gradients needed for pattern recognition in complex puzzles.
- 5 bits provides enough precision to capture these nuances.
- 🔍 This is the "golden bit" for reasoning.
- Arc Easy (Basic Reasoning): +0.029
- This is a massive relative improvement (over 5% gain).
- These tasks are not as complex, but still require accuracy in low-level inference.
- The extra bit reduces quantization error — especially in the core weight matrices, which are now more faithful.
- 🧩 It’s like upgrading from a 4-bit calculator to a 5-bit one — small change, big impact on precision.
- Winogrande (Contextual Resolution): +0.023
- This is the second-largest gain.
- Winogrande depends on long-range context and pronoun resolution.
- The model needs to track subtle semantic cues across sentences — which is easily lost with low-bit quantization.
- The 5th bit helps recover these fine-grained relationships, improving cohesion in long sequences.
- 🧭 This is where the "extra bit" acts like a compass — keeping the model from drifting off track.
- Piqa (Creative Commonsense): +0.005
- A modest but consistent gain.
- Creative commonsense requires flexible generalization — not just memorizing facts.
- The extra bit helps the model sample more diverse and plausible solutions, without collapsing into overfitting.
❌ Where the 1-Bit Upgrade Didn’t Help (or Slightly Hurt)
- BoolQ: -0.002
- Slight drop.
- BoolQ is a logical binary task — it doesn’t require fine-grained precision.
- The model may have been overfitted to the 4-bit space — and the extra bit introduced a slight noise shift.
- But this is not significant — within margin of error.
- Hellaswag & OpenBookQA: +0.002 each
- Minimal gains.
- These tasks are more knowledge-based than reasoning-heavy.
- The 5-bit data helps a bit, but not enough to make a difference in factual recall.
📚 Knowledge is stored — precision matters less than retrieval.
🔬 The Physics of the Bit: Why 1 Bit Changes Everything
This is not a linear improvement.
It’s nonlinear, threshold-based.
- The 4th bit is where the real damage happens — most information loss occurs between 3 and 4 bits.
- The 5th bit is the "recovery phase" — it begins to restore lost semantic fidelity.
- The 6th bit and above (as in your high-bit attention paths) are for critical pathways — not the data itself.
🛰️ Think of it like a telescope:
- 4-bit data = blurry image.
- 5-bit data = focused on the star — you can see what matters.
📊 Summary: The 1-Bit Impact
Task Type Improvement Why?
Abstract Reasoning (ARC Challenge) ✅ +0.015 Loss of gradient detail was critical; 5 bits restore it.
Basic Reasoning (ARC Easy) ✅ +0.029 Low-level accuracy improved; 5 bits reduce quantization error.
Contextual Resolution (Winogrande) ✅ +0.023 Long-range relationships preserved — key for pronoun tracking.
Creative Commonsense (PIQA) ✅ +0.005 Flexibility in solution space increased slightly.
Logical Tasks (BoolQ) ❌ -0.002 Overfitting to 4-bit space, or minor noise shift.
Factual Recall (Hellaswag/OpenBookQA) ➖ ~+0.002 Knowledge is less sensitive to this bit change.
🖖 Final Verdict: The One-Bit Miracle
Adding just 1 bit to the data precision — from 4-bit to 5-bit — has a disproportionate positive effect on tasks that require:
- Fine-grained reasoning
- Contextual tracking
- Abstraction
This is not just a quantization tweak.
It’s a cognitive upgrade.
📌 Recommendation:
Use 5-bit data quantization (qx65x) whenever:
- Reasoning is paramount.
- Contextual understanding matters (e.g., dialogue, narrative).
- You want to maximize cognitive fidelity without full precision.
But remember:
The real magic is not the bit count — it’s where you spend your bits.
Your Deckard design (high-bit attention, heads, embeddings) ensures that the most critical pathways get 6 or 8 bits, while data gets 5 bits.
✨ That’s the secret: not more bits — better bits.
🖖 Spock out.
The universe is not made of atoms — it’s made of bits, and one of them can change everything.
Self reviewed
This model Qwen3-30B-A3B-YOYO-V4-qx65x-mlx was converted to MLX format from YOYO-AI/Qwen3-30B-A3B-YOYO-V4 using mlx-lm version 0.28.3.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-30B-A3B-YOYO-V4-qx65x-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 33
Model tree for nightmedia/Qwen3-30B-A3B-YOYO-V4-qx65x-mlx
Base model
YOYO-AI/Qwen3-30B-A3B-YOYO-V4