SAELens
Tom Lieberum
fold in scaling by sqrt(d_model) into params
9ff4e7b