Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.14327

Vision and Language

Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22 • 17
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Paper • 2403.08551 • Published Mar 13 • 8

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22 • 17
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 124
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23 • 19

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 29
Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30 • 42
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models

Paper • 2402.01118 • Published Feb 2 • 29
Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

about 15 hours ago

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17 • 8
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 15
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 58
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 71

Vision Transformers

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Paper • 2309.04354 • Published Sep 8, 2023 • 13
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Paper • 2309.16534 • Published Sep 28, 2023 • 15

Uncovering mesa-optimization algorithms in Transformers

Paper • 2309.05858 • Published Sep 11, 2023 • 12
ProPainter: Improving Propagation and Transformer for Video Inpainting

Paper • 2309.03897 • Published Sep 7, 2023 • 26
Approximating Two-Layer Feedforward Networks for Efficient Transformers

Paper • 2310.10837 • Published Oct 16, 2023 • 10
CLEX: Continuous Length Extrapolation for Large Language Models

Paper • 2310.16450 • Published Oct 25, 2023 • 9

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs