stereoplegic 's Collections Softmax
updated
Replacing softmax with ReLU in Vision Transformers
Paper
• 2309.08586
• Published
• 19
Softmax Bias Correction for Quantized Generative Models
Paper
• 2309.01729
• Published
• 1
The Closeness of In-Context Learning and Weight Shifting for Softmax
Regression
Paper
• 2304.13276
• Published
• 1
Quantizable Transformers: Removing Outliers by Helping Attention Heads
Do Nothing
Paper
• 2306.12929
• Published
• 13
Revisiting Softmax Masking for Stability in Continual Learning
Paper
• 2309.14808
• Published
• 1
A General Theory for Softmax Gating Multinomial Logistic Mixture of
Experts
Paper
• 2310.14188
• Published
• 1
Superiority of Softmax: Unveiling the Performance Edge Over Linear
Attention
Paper
• 2310.11685
• Published
• 1
Interpret Vision Transformers as ConvNets with Dynamic Convolutions
Paper
• 2309.10713
• Published
• 1
Softmax-free Linear Transformers
Paper
• 2207.03341
• Published
• 1
Agent Attention: On the Integration of Softmax and Linear Attention
Paper
• 2312.08874
• Published
• 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry
Paper
• 2402.04347
• Published
• 15