Shows an illustrated sun in light mode and a moon with stars in dark mode.
Github | Habr | Project Page | Technical Report (soon)

KVAE 1.0: Image tokenizer

KVAE-2D model has compression 8x8 and 16 latent channels.

Evaluation results

Evaluation results of KVAE-2D model on Imagenet-256 (valid) and DIV2K (valid, high-resolution). All compared models perform 8x8 compression with 16 latent channels:

Dataset Model PSNR SSIM LPIPS rFID
ImageNet (256, val) Wan-2.1 29.03 0.85 0.069 0.62
ImageNet (256, val) Flux 31.11 0.91 0.041 0.11
ImageNet (256, val) KVAE 2D 31.71 0.91 0.054 0.46
DIV2K Wan-2.1 31.87 0.89 0.069 -
DIV2K Flux 32.64 0.91 0.061 -
DIV2K KVAE 2D 33.67 0.92 0.060 -
Downloads last month
106
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kandinskylab/KVAE-2D-1.0