@gsarti on Hugging Face: "🔍 Today's pick in Interpretability & Analysis of LMs: Can Large Language…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

gsarti

posted an update Jan 19, 2024

Post

🔍 Today's pick in Interpretability & Analysis of LMs: Can Large Language Models Explain Themselves? by @andreasmadsen Sarath Chandar & @sivareddyg

LLMs can provide wrong but convincing explanations for their behavior, and this might lead to ill-placed confidence in their predictions. This study uses self-consistency checks to measure the faithfulness of LLM explanations: if an LLM says a set of words is important for making a prediction, then it should not be able to make the same prediction without these words. Results demonstrate that LLM self-explanations faithfulness of self-explanations cannot be reliably trusted, as they prove to be very task and model dependent, with bigger model generally producing more faithful explanations.

📄 Paper: Can Large Language Models Explain Themselves? (2401.07927)

Tonic

Jan 19, 2024

i quite like using specialized models to test them out too https://huggingface.co/spaces/TeamTonic/hallucination-test

In this post