vinsis
/

multimodal-patch-embeddings

Model card Files Files and versions Community

vinsis commited on Aug 13

Commit

b205184

•

1 Parent(s): 1b21f07

Update README.md

Browse files

This is the checkpoint for the repo https://github.com/TinyVolt/multimodal-patch-embeddings. The repo contains the code for distillation of a 21.3M distilled ViT model using OpenAI CLIP ViT model as the teacher. What makes this model so special is that the embedding of each of the image patches is in the same embedding space as the final embedding. In fact, the final embedding is just a convex sum of the patch embeddings. This allows one to compare the text embedding with each of the 64 image patch embeddings.

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -1,3 +1,4 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---