Bofei Zhang's picture

In a Training Loop 🔄

3 7 33

Bofei Zhang PRO

Bofeee5675

·

https://bofei5675.github.io/

AI & ML interests

Vision Language Model & Agentic Task & Computer-Use

Recent Activity

updated a dataset 7 days ago

pix2fact/Pix2FactBenchmark

liked a dataset 7 days ago

pix2fact/Pix2FactBenchmark

updated a dataset 15 days ago

Bofeee5675/Pix2Fact100Subset

View all activity

Organizations

upvoted 3 papers 4 months ago

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Paper • 2504.21561 • Published Apr 30, 2025 • 1

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Paper • 2505.15436 • Published May 21, 2025 • 2

Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

Paper • 2509.23866 • Published Sep 28, 2025 • 14

upvoted a paper 5 months ago

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Paper • 2509.07969 • Published Sep 9, 2025 • 59

upvoted a paper 7 months ago

TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

Paper • 2506.03099 • Published Jun 3, 2025 • 19

upvoted a paper 8 months ago

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Paper • 2504.12679 • Published Apr 17, 2025 • 1

upvoted a collection 9 months ago

TongUI

Open source our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials; https://github.com/TongUI-agent/TongUI-agent • 13 items • Updated Nov 3, 2025 • 3