Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09871

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published Nov 10 • 34
SlimLM: An Efficient Small Language Model for On-Device Document Assistance

Paper • 2411.09944 • Published Nov 15 • 12
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

Paper • 2411.19460 • Published 20 days ago • 10
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published 13 days ago • 44

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 167
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 15 days ago • 117
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published 14 days ago • 103
o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published 20 days ago • 39

about 6 hours ago

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published Sep 13 • 11
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12 • 43
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74
LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18 • 30

about 5 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3 • 32
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17 • 25
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 21

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 6
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 6 days ago • 58

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 58
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 95
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6 • 34
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 109

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4 • 49
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 51
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1 • 23

interesting stuff

about 22 hours ago

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs