Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Paper
•
2504.20966
•
Published
•
33
Pretrained models from the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"