[Update Request] Please re-pull weights for Kai-3B-Instruct (v1.1 fixes mode collapse)

#1925
by OzTianlu - opened

Hi @mradermacher ,
First of all, thank you so much for your incredible automated quantization work! Your GGUF conversion helped our model, Kai-3B-Instruct, get picked up and recommended on LM Studio today, which is a huge honor for our lab.However, we noticed a critical issue with the currently deployed GGUFs (specifically the Q4_K_S variants), and we kindly request a re-pull of the weights and a small note in your repo.1. Stale Weights (v1.0 vs. v1.1)It appears the current GGUFs were snapshotted from our initial v1.0 push. That version suffered from "logic poisoning" (overfitting to reasoning datasets), causing severe mode collapse where the model forgot how to chat and only spoke in rigid Analysis -> Approach -> Solution templates.We have since completed a 4000-step annealing phase with a balanced SlimOrca mix and pushed the v1.1 weights to our main repo (NoesisLab/Kai-3B-Instruct), which completely restores its conversational sanity while retaining its logical capabilities. Could you please trigger a re-pull and re-quantize using the latest main branch?2. Extreme Sensitivity to 4-bit Quantization (The ADS Algorithm)Unlike standard SFT models, Kai-3B is trained using a novel technique called Adaptive Dual-Search (ADS) Distillation. It uses a parameter-free log-barrier penalty based on Shannon entropy to physically prune the latent space into a sharp, low-entropy manifold (enabling $O(1)$ reasoning without CoT).Because the model's logic crystal is so tightly packed at 3B parameters, aggressive low-bit quantization (like Q4_K_S) introduces quantization noise that acts as artificial entropy. To dodge this entropy penalty, the quantized model instinctively retreats into its rigid fallback templates, breaking the conversational alignment.Requests:Please update the GGUFs with our latest v1.1 weights.If possible, could you add a brief warning in your README? Something like: "Note: Due to the ADS distillation method, this model is highly sensitive to quantization noise. Q8_0 or Q6_K are strongly recommended for preserving both logical integrity and conversational alignment. Q4 variants may exhibit template collapse."Thank you again for empowering the open-source community. We truly appreciate your work!Best regards,
[NoesisLab]

Sign up or log in to comment