Shows an illustrated sun in light mode and a moon with stars in dark mode.

Github | Habr | Project Page | Technical Report (soon)

KVAE 1.0: Image tokenizer

KVAE-2D model has compression 8x8 and 16 latent channels.

Evaluation results

Evaluation results of KVAE-2D model on Imagenet-256 (valid) and DIV2K (valid, high-resolution). All compared models perform 8x8 compression with 16 latent channels:

Dataset	Model	PSNR	SSIM	LPIPS	rFID
ImageNet (256, val)	Wan-2.1	29.03	0.85	0.069	0.62
ImageNet (256, val)	Flux	31.11	0.91	0.041	0.11
ImageNet (256, val)	KVAE 2D	31.71	0.91	0.054	0.46
DIV2K	Wan-2.1	31.87	0.89	0.069	-
DIV2K	Flux	32.64	0.91	0.061	-
DIV2K	KVAE 2D	33.67	0.92	0.060	-

Downloads last month: 106

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kandinskylab/KVAE-2D-1.0

KVAE 1.0

Collection

KVAE 1.0 tokenizers are for images (KVAE-2D-1.0) and video (KVAE-3D-1.0) are distributed under MIT license (commercial use is possible). • 2 items • Updated 10 days ago • 6