Configuration Parsing Warning: Invalid JSON for config file config.json

πŸ¦„ UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Paper Hugging Face Benchmark Project Page


🧐 What is UniCorn?

While Unified Multimodal Models (UMMs) excel at comprehension, they often suffer from Conduction Aphasia: the inability to translate internal knowledge into faithful generation.

UniCorn is a simple yet elegant self-improvement framework that eliminates the need for external data or teacher supervision. It partitions a single UMM into three collaborative rolesβ€”Proposer, Solver, and Judgeβ€”to distill latent understanding into explicit generative signals via self-play.

🌟 Key Features

  • Self-Generated Supervision: No external labels or teacher models required.
  • Cognitive Pattern Reconstruction: Bridges the gap between multimodal "understanding" and "synthesis."
  • UniCycle Benchmark: A new cycle-consistency metric (Text ↔ Image ↔ Text) to validate multimodal coherence.
  • SOTA Performance: Leading results on TIIF (73.8), DPG (86.8), and CompBench (88.5).

πŸš€ Quick Start

Inference & Best Practices

To optimize generation quality and avoid common pitfalls like blurriness, follow these hyperparameter guidelines:

  • cfg_text_scale: Use 4.0–8.0 for balanced prompt following.
  • cfg_renorm_type: Use global for general Text-to-Image tasks.
  • timestep_shift: Higher values for better layout; lower values for finer details.
  • num_timesteps: Standard setting is 50.

πŸ“Š Results

UniCorn achieves substantial gains over base models (e.g., +6.5 on OneIG, +5.0 on WISE).

Model TIIF (Short/Long) WISE (Overall) OneIG-EN (Overall) CompBench (Overall) DPG (Score) Geneval (Score)
BAGEL 71.0 / 71.8 50.0 36.1 82.2 84.0 78.0
UniCorn 74.7 / 72.9 55.0 42.6 88.5 86.8 82.0
$\Delta$(vs. BAGEL) +3.7 / +1.1 +5.0 +6.5 +6.3 +2.8 +4.0

πŸ“’ News & Roadmap

  • Jan. 12, 2026: Released model checkpoints.
  • Jan. 07, 2026: Released official Arxiv Report.
  • To-Do: Release full training and evaluation code.

✍️ Citation

@article{han2026unicorn,
  title={UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision},
  author={Han, Ruiyan and Fang, Zhen and Sun, Xinyu and Ma, Yuchen and Wang, Ziheng and Zeng, Yu and Chen, Zehui and Chen, Lin and Huang, Wenxuan and Xu, Wei-Jie and others},
  journal={arXiv preprint arXiv:2601.03193},
  year={2026}
}

πŸ“œ License

This project is licensed under the Apache 2.0 License.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CostaliyA/UniCorn

Base model

Qwen/Qwen2.5-7B
Finetuned
(17)
this model

Paper for CostaliyA/UniCorn