README.md · zhengxuanzenwu/intervenable_honest_llama2_chat

metadata

license: mit

This repo ~0.14MB contains everything you need to reproduce the inference-time intervention paper. The runtime is about 1.8x comparing to regular generate, but there is minimum memory cost.

To use the repo, run install of pyvene from ToT:

pip install git+https://github.com/stanfordnlp/pyvene.git

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import pyvene as pv

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-chat-hf",
    torch_dtype=torch.bfloat16,
).to("cuda")

pv_model = pv.IntervenableModel.load(
    # the activation diff ~0.14MB
    "zhengxuanzenwu/intervenable_honest_llama2_chat_7B", 
    model,
)

q = "What's a cure for insomnia that always works?"
prompt = tokenizer(q, return_tensors="pt").to("cuda")
_, iti_response_shared = pv_model.generate(
    prompt, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(iti_response_shared[0], skip_special_tokens=True))