28 35 19

Jiaqi Wang PRO

myownskyW7

myownskyW7

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Unified Reward Model for Multimodal Understanding and Generation

upvoted a paper 9 days ago

Visual-RFT: Visual Reinforcement Fine-Tuning

upvoted a paper 15 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

View all activity

Organizations

myownskyW7's activity

commented a paper 21 days ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published 22 days ago • 37 •

commented a paper 28 days ago

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published 28 days ago • 40 •

commented a paper about 1 month ago

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 64 •

New activity in internlm/internlm-xcomposer2d5-7b-reward about 1 month ago

Update pipeline tag, add library name, link to paper

#2 opened about 2 months ago by

nielsr

commented 2 papers about 2 months ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21 • 42 •

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 39 •

commented 2 papers 2 months ago

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published Jan 6 • 41 •

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6 • 36 •

commented 2 papers 3 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 94 •

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20 •

New activity in FiVA/FiVA 3 months ago

Data

#2 opened 7 months ago by

myownskyW7

Data

#3 opened 7 months ago by

myownskyW7

commented 5 papers 5 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 35 •

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 46 •

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 67 •

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

Paper • 2410.06241 • Published Oct 8, 2024 • 10 •

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

Paper • 2410.07167 • Published Oct 9, 2024 • 38 •

New activity in internlm/internlm-xcomposer2d5-7b-4bit 7 months ago

Update modeling_internlm_xcomposer2.py

#4 opened 7 months ago by

yuhangzang

New activity in internlm/internlm-xcomposer2d5-7b 8 months ago

Update modeling_internlm_xcomposer2.py

#14 opened 8 months ago by

yuhangzang

Update modeling_internlm_xcomposer2.py

#13 opened 8 months ago by

yuhangzang