arxiv:2412.07334

Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Published on Dec 10

· Submitted by

pvalois on Dec 11

Upvote

Authors:

Pedro H. V. Valois ,

Abstract

Interpretability is a key challenge in fostering trust for Large Language Models (LLMs), which stems from the complexity of extracting reasoning from model's parameters. We present the Frame Representation Hypothesis, a theoretically robust framework grounded in the Linear Representation Hypothesis (LRH) to interpret and control LLMs by modeling multi-token words. Prior research explored LRH to connect LLM representations with linguistic concepts, but was limited to single token analysis. As most words are composed of several tokens, we extend LRH to multi-token words, thereby enabling usage on any textual data with thousands of concepts. To this end, we propose words can be interpreted as frames, ordered sequences of vectors that better capture token-word relationships. Then, concepts can be represented as the average of word frames sharing a common concept. We showcase these tools through Top-k Concept-Guided Decoding, which can intuitively steer text generation using concepts of choice. We verify said ideas on Llama 3.1, Gemma 2, and Phi 3 families, demonstrating gender and language biases, exposing harmful content, but also potential to remediate them, leading to safer and more transparent LLMs. Code is available at https://github.com/phvv-me/frame-representation-hypothesis.git

View arXiv page View PDF Add to collection

Community

pvalois

Paper author Paper submitter 8 days ago

•

edited 8 days ago

The Frame Representation Hypothesis is a robust framework for understanding and controlling LLMs: we propose words can be interpreted as frames, ordered sequences of vectors that better capture token-word relationships. Then, concepts can be represented as the average of word frames sharing a common concept.

We showcase these tools through Guided Decoding, which can intuitively steer text generation using concepts of choice: Top-k tokens derived from LLM and one maximizing correlation with target Concept Frame is chosen.

We use the Open Multilingual WordNet to generate Concept Frames that can both guide the model text generation and expose biases or vulnerabilities.

CAUTION: There are examples containing sensitive material that may be distressing for some audiences.

pvalois

Paper author Paper submitter 8 days ago

•

edited 8 days ago

Project page: https://phvv.me/frame-representation-hypothesis/

librarian-bot

7 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

dSat

7 days ago

Really interesting work, specially showing how these models have many vulnerabilities which need to be fixed.
Would also be interesting to explore more about these effects on other language models which have more diverse language capabilities.