-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 97 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 116 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 50
Danil
Potatochka
AI & ML interests
None yet
Organizations
None yet
Collections
2
models
None public yet
datasets
None public yet