sepilqi

sepilqi
·

AI & ML interests

None yet

Recent Activity

Organizations

qarxan baza's profile picture Blog-explorers's profile picture Social Post Explorers's profile picture Chinese LLMs on Hugging Face's profile picture AYaM's profile picture

sepilqi's activity

upvoted 2 articles 4 months ago
view article
Article

Exploring the Daily Papers Page on Hugging Face

49
view article
Article

A Short Summary of Chinese AI Global Expansion

22
upvoted an article 6 months ago
published an article 6 months ago
upvoted an article 7 months ago
view article
Article

Getting Started with Sentiment Analysis using Python

38
New activity in blog-explorers/README 7 months ago

[Support] Community Articles

80
#5 opened 11 months ago by
victor
upvoted 2 articles 7 months ago
view article
Article

How NuminaMath Won the 1st AIMO Progress Prize

116
view article
Article

Mixture of Experts Explained

351
reacted to akhaliq's post with 👍 9 months ago
view post
Post
21008
Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.
updated a collection 9 months ago