Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 6 days ago • 38
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published 7 days ago • 8
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 7 days ago • 14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 8 days ago • 18
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published 14 days ago • 8
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published 11 days ago • 24
Evaluating Multiview Object Consistency in Humans and Image Models Paper • 2409.05862 • Published 10 days ago • 8
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published 14 days ago • 24
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 17 days ago • 94
Affordance-based Robot Manipulation with Flow Matching Paper • 2409.01083 • Published 18 days ago • 9
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold Paper • 2408.14608 • Published 24 days ago • 7
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published 21 days ago • 25
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published 22 days ago • 44
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published 21 days ago • 29
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing Paper • 2408.14849 • Published 24 days ago • 3
Platypus: A Generalized Specialist Model for Reading Text in Various Forms Paper • 2408.14805 • Published 24 days ago • 11
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published 24 days ago • 23
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published 23 days ago • 27
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 24 days ago • 137
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published 28 days ago • 13
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published 28 days ago • 33
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published 28 days ago • 50
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published 30 days ago • 16
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published 29 days ago • 53
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields Paper • 2408.06697 • Published Aug 13 • 13
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework Paper • 2408.06190 • Published Aug 12 • 17
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 55
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis Paper • 2408.03356 • Published Aug 6 • 8
Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields Paper • 2408.03822 • Published Aug 7 • 9
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 35
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation Paper • 2408.03281 • Published Aug 6 • 9
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published Aug 5 • 28
ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget Paper • 2408.00103 • Published Jul 31 • 15
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Paper • 2408.00113 • Published Jul 31 • 6
Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Paper • 2408.00298 • Published Aug 1 • 8
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models Paper • 2408.00735 • Published Aug 1 • 15
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning Paper • 2408.00690 • Published Aug 1 • 21
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement Paper • 2408.00653 • Published Aug 1 • 27
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31 • 25
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle Paper • 2407.19548 • Published Jul 28 • 22