Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2510.25889

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published 9 days ago • 8
A Survey on Efficient Vision-Language-Action Models

Paper • 2510.24795 • Published 13 days ago • 5

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 10 days ago • 102
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

about 18 hours ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 34
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published about 1 month ago • 31
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5 • 22
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published 26 days ago • 26

Vision Language Action models

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 38
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 34
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 31

Papers_robots-RL

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published 9 days ago • 8

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published 18 days ago • 44
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

Chain of thought

NousResearch/Hermes-4-70B-FP8

Text Generation • 71B • Updated Sep 12 • 821 • 24
NousResearch/Hermes-4-405B-FP8

Text Generation • 406B • Updated Sep 2 • 1.1k • 19
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 29
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19 • 14

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 141
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Papers_robots-RL

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published 9 days ago • 8

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published 9 days ago • 8
A Survey on Efficient Vision-Language-Action Models

Paper • 2510.24795 • Published 13 days ago • 5

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 10 days ago • 102
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published 18 days ago • 44
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published 10 days ago • 58

about 18 hours ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 34
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

Paper • 2510.08551 • Published about 1 month ago • 31
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Paper • 2510.04212 • Published Oct 5 • 22
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published 26 days ago • 26

Chain of thought

NousResearch/Hermes-4-70B-FP8

Text Generation • 71B • Updated Sep 12 • 821 • 24
NousResearch/Hermes-4-405B-FP8

Text Generation • 406B • Updated Sep 2 • 1.1k • 19
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 29
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19 • 14

Vision Language Action models

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 38
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 34
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 31

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 141
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs