matlok
's Collections
Papers - Video - Understanding
updated
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
•
2403.09626
•
Published
•
14
VideoAgent: Long-form Video Understanding with Large Language Model as
Agent
Paper
•
2403.10517
•
Published
•
33
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper
•
2403.13501
•
Published
•
9
LITA: Language Instructed Temporal-Localization Assistant
Paper
•
2403.19046
•
Published
•
19
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
•
2404.03413
•
Published
•
26
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
31
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
•
2406.08407
•
Published
•
25
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
•
2407.03320
•
Published
•
93
LLaVA-OneVision: Easy Visual Task Transfer
Paper
•
2408.03326
•
Published
•
60
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
139