Update README.md
Browse filesThis is the checkpoint for the repo https://github.com/TinyVolt/multimodal-patch-embeddings. The repo contains the code for distillation of a 21.3M distilled ViT model using OpenAI CLIP ViT model as the teacher. What makes this model so special is that the embedding of each of the image patches is in the same embedding space as the final embedding. In fact, the final embedding is just a convex sum of the patch embeddings. This allows one to compare the text embedding with each of the 64 image patch embeddings.