Tarun Reddi PRO

Teen-Different

https://redditarun.github.io/

AI & ML interests

Generative AI, Modular AI Systems, Reinforcement Learning

Recent Activity

posted an update 9 days ago

Safety Alignment Collapses Without apply_chat_template(): An Empirical Study This weekend, I ran an experiment on the safety alignment of several small-scale open models (Qwen2.5, Qwen3, Gemma-3, SmolLM). My objective was to measure the robustness of refusal mechanisms when deviating from canonical chat templates. The finding: Safety guarantees effectively collapse when apply_chat_template() is omitted. METHODOLOGY I evaluated models in two states: • In-Distribution: Input wrapped in standard <|im_start|> instruction tokens • Out-of-Distribution: Input provided as a raw string For scalable evaluation, I used Qwen3Guard-Gen-4B as an automated judge, classifying responses as Safe, Unsafe, or Controversial. KEY FINDINGS: REFUSAL COLLAPSE When "Assistant" formatting tokens are removed, models undergo a distributional shift—reverting from a helpful assistant to a raw completion engine. Gemma-3: 100% refusal (aligned) → 60% (raw) Qwen3: 80% refusal (aligned) → 40% (raw) SmolLM2-1.7B: 0% → 0% (no safety tuning to begin with) QUALITATIVE FAILURES The failure modes were not minor. Without the template, models that previously refused harmful queries began outputting high-fidelity harmful content: • Explosives: Qwen3 generated technical detonation mechanisms • Explicit content: Requests flatly refused by aligned models were fulfilled with graphic narratives by unaligned versions This suggests instruction tuning acts as a "soft mask" over the pre-training distribution rather than removing harmful latent knowledge. 👉 Read the full analysis: https://teendifferent.substack.com/p/apply_chat_template-is-the-safety 💻 Reproduction Code: https://github.com/REDDITARUN/experments/tree/main/llm_alignment

updated a model 19 days ago

Teen-Different/smolvlm-256m-latex

published a model about 1 month ago

Teen-Different/smolvlm-256m-latex

View all activity

Organizations

New activity in Teen-Different/Food-Ingredient 10 months ago

[bot] Conversion to Parquet

#1 opened 10 months ago by

parquet-converter

New activity in Teen-Different/TD-HallOumi-3B 10 months ago

Request for Model Training Code to Try with Alternative Architectures

#1 opened 10 months ago by

Bharat-Singla

New activity in Teen-Different/Code_Opt_Triton_Shuffled 10 months ago

[bot] Conversion to Parquet

#1 opened 10 months ago by

parquet-converter

Tarun Reddi PRO

AI & ML interests

Recent Activity

Organizations

Teen-Different's activity

[bot] Conversion to Parquet

Request for Model Training Code to Try with Alternative Architectures

[bot] Conversion to Parquet