🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 135 items • Updated 7 days ago • 116
Running 85 The Eiffel Tower Llama 📝 85 Explore the Eiffel Tower Llama experiment with open-source models
Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated 9 days ago • 14
👤 Implicit Personalization in Language Models Collection Works on detecting, attributing and controlling implicit personalization in language models • 20 items • Updated 11 days ago • 1
👤 Implicit Personalization in Language Models Collection Works on detecting, attributing and controlling implicit personalization in language models • 20 items • Updated 11 days ago • 1
👤 Implicit Personalization in Language Models Collection Works on detecting, attributing and controlling implicit personalization in language models • 20 items • Updated 11 days ago • 1
👤 Implicit Personalization in Language Models Collection Works on detecting, attributing and controlling implicit personalization in language models • 20 items • Updated 11 days ago • 1