You can find the best of Johannes's work here
Generate captions for images using noise-injected CLIP