Transformers generalize differently from information stored in context vs in weights Paper • 2210.05675 • Published Oct 11, 2022 • 2