SixAILab

community

https://sihanxu.github.io/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Wayne-King authored a paper 10 days ago

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

marstin authored a paper about 1 month ago

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

marstin authored a paper about 1 month ago

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

View all activity

Wayne-King

authored a paper 10 days ago

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

Paper • 2511.20561 • Published 11 days ago • 31

marstin

authored 4 papers about 1 month ago

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

Paper • 2508.08113 • Published Aug 11 • 11

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

Paper • 2510.02292 • Published Oct 2 • 1

Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry

Paper • 2510.25595 • Published Oct 29

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Paper • 2511.01163 • Published Nov 3 • 31

Wayne-King

authored 5 papers about 2 months ago

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

Paper • 2510.13778 • Published Oct 15 • 16

marstin

authored 3 papers 5 months ago

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

Paper • 2506.21876 • Published Jun 27 • 28

Can Vision Language Models Infer Human Gaze Direction? A Controlled Study

Paper • 2506.05412 • Published Jun 4 • 4

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Paper • 2506.18890 • Published Jun 23 • 6

wchai

authored 7 papers 6 months ago

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Paper • 2404.04910 • Published Apr 7, 2024

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

Paper • 2503.04240 • Published Mar 6

Science-T2I: Addressing Scientific Illusions in Image Synthesis

Paper • 2504.13129 • Published Apr 17 • 3

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark

Paper • 2504.14693 • Published Apr 20

EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments

Paper • 2503.08604 • Published Mar 11

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Paper • 2505.23606 • Published May 29 • 14

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13 • 23

AI & ML interests

Recent Activity

Team members 7

SixAILab's activity