SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models Paper • 2510.08559 • Published Oct 9 • 8
view post Post 1918 Wan 2.2 fast upto 10x faster than original wan 2.2Model: FastVideo/FastWan2.2-TI2V-5B-FullAttn-DiffusersSpace: KingNish/wan2-2-fast See translation 🚀 2 2 👍 1 1 + Reply
view post Post 1164 What's currently the biggest gap in Open Source Datasets ?? See translation 5 replies · 🧠 2 2 👍 1 1 + Reply
Re-thinking Temporal Search for Long-Form Video Understanding Paper • 2504.02259 • Published Apr 3 • 1
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation Paper • 2306.00977 • Published Jun 1, 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation Paper • 2306.13631 • Published Jun 23, 2023 • 10
OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views Paper • 2404.03650 • Published Apr 4, 2024
3D Segmentation of Humans in Point Clouds with Synthetic Data Paper • 2212.00786 • Published Dec 1, 2022
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Paper • 2407.20229 • Published Jul 29, 2024 • 7
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries Paper • 2211.15658 • Published Nov 28, 2022
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels Paper • 2312.17232 • Published Dec 28, 2023
P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising Paper • 2408.16325 • Published Aug 29, 2024 • 3
SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs Paper • 2404.00469 • Published Mar 30, 2024
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation Paper • 2210.03105 • Published Oct 6, 2022
OpenCity3D: What do Vision-Language Models know about Urban Environments? Paper • 2503.16776 • Published Mar 21 • 3