markweber
/

taming_vqgan

Model card Files Files and versions Community

markweber commited on Dec 6, 2024

Commit

345a906

·

verified ·

1 Parent(s): d994cf6

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- ILSVRC/imagenet-1k
+model-index:
+  - name: Taming-VQGAN
+    results:
+      - task:
+          type: image-generation
+        dataset:
+          name: ILSVRC/imagenet-1k
+          type: ILSVRC/imagenet-1k
+        metrics:
+          - name: rFID
+            type: rFID
+            value: 7.96
+          - name: InceptionScore
+            type: InceptionScore
+            value: 115.9
+          - name: LPIPS
+            type: LPIPS
+            value: 0.306
+          - name: PSNR
+            type: PSNR
+            value: 20.2
+          - name: SSIM
+            type: SSIM
+            value: 0.52
+          - name: CodebookUsage
+            type: CodebookUsage
+            value: 0.445
+---
+This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.
+You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer.