Prasanna Iyer's picture

Prasanna Iyer

prasiyer

AI & ML interests

None yet

Recent Activity

replied to merve's post 7 months ago
Chameleon ๐ŸฆŽ by Meta is now available in Hugging Face transformers ๐Ÿ˜ A vision language model that comes in 7B and 34B sizes ๐Ÿคฉ But what makes this model so special? Demo: https://huggingface.co/spaces/merve/chameleon-7b Models: https://huggingface.co/collections/facebook/chameleon-668da9663f80d483b4c61f58 keep reading โฅฅ Chameleon is a unique model: it attempts to scale early fusion ๐Ÿคจ But what is early fusion? Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM) Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation ๐Ÿ˜ Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO) This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use. One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks https://huggingface.co/papers/2405.09818) Thanks for reading!
replied to merve's post 7 months ago
Chameleon ๐ŸฆŽ by Meta is now available in Hugging Face transformers ๐Ÿ˜ A vision language model that comes in 7B and 34B sizes ๐Ÿคฉ But what makes this model so special? Demo: https://huggingface.co/spaces/merve/chameleon-7b Models: https://huggingface.co/collections/facebook/chameleon-668da9663f80d483b4c61f58 keep reading โฅฅ Chameleon is a unique model: it attempts to scale early fusion ๐Ÿคจ But what is early fusion? Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM) Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation ๐Ÿ˜ Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO) This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use. One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks https://huggingface.co/papers/2405.09818) Thanks for reading!
View all activity

Organizations

None yet

prasiyer's activity

reacted to nroggendorff's post with ๐Ÿ‘ 26 days ago
view post
Post
1253
maybe a page where you can find open orgs to get started in collaboration with hf. i see so many people that dont have a direction.


i dont have ulterior motives, so dont ask
  • 1 reply
ยท
replied to merve's post 7 months ago
view reply

I have attached the image and the prompt. This is the response from the chatbot -

The image depicts a table comparing various companies and their carbon footprints. The table lists the top 10 largest greenhouse gas emitters in the world, with their respective carbon footprints expressed in millions of metric tons of carbon dioxide equivalent (MtCO2e). The first column features the company name, while the second column displays the year in which the carbon footprint was measured. The third column indicates the sector in which the company operates, and the fourth column provides the company's carbon footprint. The last column shows the company's market capitalization, which is the total value of all outstanding shares of a company's stock. The table also includes a row labeled "Total" that shows the total carbon footprint of all the companies listed in the table.

replied to merve's post 7 months ago
replied to merve's post 7 months ago
view reply

Thanks for the post and your efforts to share the knowledge.

https://huggingface.co/spaces/merve/chameleon-7b -- Space does not seem to work. When I ask for a summary of an image, the result is a summary of some random table and not of the one I uploaded. Please check when you can