VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15 • 46
Adding Conditional Control to Text-to-Image Diffusion Models Paper • 2302.05543 • Published Feb 10, 2023 • 42
Can Vision-Language Models Think from a First-Person Perspective? Paper • 2311.15596 • Published Nov 27, 2023 • 3