Activator: GLU Activations as The Core Functions of a Vision Transformer Paper • 2405.15953 • Published May 24, 2024 • 1
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function Paper • 2403.02411 • Published Mar 4, 2024 • 4