Omni-modal Embedding Foundation Model
AI & ML interests
None defined yet.
Recent Activity
Scalable Vision Language Model Training via High Quality Data Curation
Visual Foundation Models Powering Vision-Language Models
-
BytedanceDouyinContent/SAILViT-Large-300M-448px
Image Feature Extraction • 0.3B • Updated • 323 • 2 -
BytedanceDouyinContent/SAILViT-Huge-600M-448px
Image Feature Extraction • 0.7B • Updated • 21 • 3 -
SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement
Paper • 2507.01643 • Published • 2
Omni-modal Embedding Foundation Model
Scalable Vision Language Model Training via High Quality Data Curation
Visual Grounded Reasoning
Visual Foundation Models Powering Vision-Language Models
-
BytedanceDouyinContent/SAILViT-Large-300M-448px
Image Feature Extraction • 0.3B • Updated • 323 • 2 -
BytedanceDouyinContent/SAILViT-Huge-600M-448px
Image Feature Extraction • 0.7B • Updated • 21 • 3 -
SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement
Paper • 2507.01643 • Published • 2