KVAE 1.0
Collection
KVAE 1.0 tokenizers are for images (KVAE-2D-1.0) and video (KVAE-3D-1.0) are distributed under MIT license (commercial use is possible).
•
2 items
•
Updated
•
6
KVAE-2D model has compression 8x8 and 16 latent channels.
Evaluation results of KVAE-2D model on Imagenet-256 (valid) and DIV2K (valid, high-resolution). All compared models perform 8x8 compression with 16 latent channels:
| Dataset | Model | PSNR | SSIM | LPIPS | rFID |
|---|---|---|---|---|---|
| ImageNet (256, val) | Wan-2.1 | 29.03 | 0.85 | 0.069 | 0.62 |
| ImageNet (256, val) | Flux | 31.11 | 0.91 | 0.041 | 0.11 |
| ImageNet (256, val) | KVAE 2D | 31.71 | 0.91 | 0.054 | 0.46 |
| DIV2K | Wan-2.1 | 31.87 | 0.89 | 0.069 | - |
| DIV2K | Flux | 32.64 | 0.91 | 0.061 | - |
| DIV2K | KVAE 2D | 33.67 | 0.92 | 0.060 | - |