Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.17887

lshort-transformers

Papers useful when writing the paper: "The Not So Short Transfromers"

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6 • 62
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 68
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 150
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 16
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 22

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 89
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 60

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 20
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Paper • 2402.04833 • Published Feb 7 • 6

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 52
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 44

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 24
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 104

Papers - University - MIT

One-step Diffusion with Distribution Matching Distillation

Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11
Locating and Editing Factual Associations in GPT

Paper • 2202.05262 • Published Feb 10, 2022 • 1

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs