Aidy Osu's picture

Aidy Osu

aidystark

·

AI & ML interests

Vision;Language;Speech

Recent Activity

liked a model 3 days ago

fixie-ai/ultravox-v0_4_1-llama-3_1-8b

upvoted a collection 4 days ago

UltraVox Audio Language Model Release 🔊

liked a Space 4 days ago

echo840/ocrbench-leaderboard

Organizations

aidystark's activity

upvoted a collection 4 days ago

UltraVox Audio Language Model Release 🔊

3 items • Updated 6 days ago • 15

upvoted an article 2 months ago

Article

Fine-tuning Parler TTS on a Specific Language

By

•

Sep 16

• 27

upvoted a collection 5 months ago

TinyLLaVA

TinyLLaVA: A Framework of Small-scale Large Multimodal Models • 7 items • Updated Mar 19 • 5

upvoted a collection 7 months ago

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30 • 33

upvoted a paper 9 months ago

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93

upvoted 3 papers about 1 year ago

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Paper • 2311.10794 • Published Nov 17, 2023 • 24

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Paper • 2309.00775 • Published Sep 2, 2023 • 8

upvoted 3 papers over 1 year ago

Multi-Modal Classifiers for Open-Vocabulary Object Detection

Paper • 2306.05493 • Published Jun 8, 2023 • 6

Generative Pretraining in Multimodality

Paper • 2307.05222 • Published Jul 11, 2023 • 21

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Paper • 2306.16928 • Published Jun 29, 2023 • 38