Davide Buoso's picture

2 3 4

Davide Buoso

lambdavi

·

lambdavi

AI & ML interests

Data Science student @ Polytechnic University of Turin. Interested in the intersection of Robotics and Reinforcement Learning.

Recent Activity

upvoted a paper 10 days ago

Video Motion Transfer with Diffusion Transformers

liked a dataset 15 days ago

kelvin34501/OakInk-v2

updated a model 16 days ago

lambdavi/span-marker-luke-legal

View all activity

Organizations

None yet

lambdavi's activity

upvoted a paper 10 days ago

Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published 11 days ago • 16

liked a dataset 15 days ago

kelvin34501/OakInk-v2

Updated 13 days ago • 3.65k • 4

updated a model 16 days ago

lambdavi/span-marker-luke-legal

Token Classification • Updated 16 days ago • 15 • 3

upvoted a paper 16 days ago

Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval

Paper • 2411.04006 • Published Nov 6 • 1

reacted to akhaliq's post with 👍 7 months ago

Post

20881

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

liked 2 models 9 months ago

ai21labs/Jamba-v0.1

Text Generation • Updated Sep 11 • 10.9k • 1.17k

bidiptas/PG-InstructBLIP

Image-to-Text • Updated Jan 22 • 12

updated a model 10 months ago

lambdavi/legal-luke-base-ner

Token Classification • Updated Feb 21 • 14

updated a collection 10 months ago

NLP

NLP related models. • 8 items • Updated Feb 20

updated 3 models 10 months ago

lambdavi/Reinforce-Pixelcopter-PLE-v0

Reinforcement Learning • Updated Feb 13

lambdavi/ppo-Pyramids

Reinforcement Learning • Updated Feb 13 • 6

lambdavi/a2c-PandaReachDense-v3

Reinforcement Learning • Updated Feb 13 • 1

updated a collection 11 months ago

RL

Reinforcement Learning related models • 7 items • Updated Jan 28 • 1

updated a model 11 months ago

lambdavi/ddpg-PandaReach-v3

Reinforcement Learning • Updated Jan 28

liked a Space 11 months ago

ML Agents SnowballTarget

updated a model 11 months ago

lambdavi/ppo-SnowballTarget

Reinforcement Learning • Updated Jan 15 • 22

updated a collection 12 months ago

NLP

NLP related models. • 8 items • Updated Feb 20