dlouapre 's Collections

Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability

A compilation of sparse auto-encoders trained on large language models.