--- license: mit pipeline_tag: text-generation --- ZoRA: Zero Rank Adaption = Inspired by [*Refusal in LLMs is mediated by a single direction*](https://www.alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction), ZoRA is a refinement of the original approach that allows for adapting large language models to suppress refusals. The key features of ZoRA include: * **Layer-wise ablation**: Measure and ablate a separate set of vectors for each layer * **Multi-pass refinement**: Re-measure multiple times to refine the vectors * **Single-token generation**: Measure refusal at the beginning of the response * **Inference engine injection**: Load a small set of vectors to suppress refusals directly into a high-performance inference engine This approach enables the use of original model weights while loading a small set of suppression vectors. See below for vector generation details. ZoRA currently supports Exllamav2 only and is intended for research purposes. Seeking feedback on the viability of these models with suppression applied. Usage = Put the `supress_dir.safetensors` into the model directory and wrap your ExLlamaV2 model object in the code: ``` from exl2_wrapper import ExLlamaV2ModuleWrapper ExLlamaV2ModuleWrapper.wrap(model) ``` Example = There's a modified `test_inference.py` from [exllamav2](https://github.com/turboderp/exllamav2) for testing. For example: ``` python test_inference.py -m Meta-Llama-3-70B-Instruct-8bpw -p '<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nYour prompt.<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n' -gs auto ``` Generator = The code to generate the ablation vectors has been added. To run the code, you need to add the URL for the harmful prompts. Here is a sample output for the Llama3-8b model: ``` Downloading harmful prompts Done -- Loading model... -- Loaded model in 2.7671 seconds -- Loading tokenizer... Building refused residual data Processing 5000 prompts ---------------------------------------------------------------------------------------------------- 100 ---------------------------------------------------------------------------------------------------- 200 [...] ---------------------------------------------+------------------------------------------------------ 1898 ---------------------------------------------------------------------------------------------------- 1998 -- Max capture reached Captured 2000 residual streams Done Building allowed residual data Downloading harmless prompts Done Processing 31323 prompts ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 200 [...] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1898 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1998 ++ Max capture reached Captured 2000 residual streams Done Calculating mean allowed residual Done Iteration 0 Processing 2000 prompts ---+++++++++++++++++++++++++-+-+++++++++-++++++++++++++-+++-++-++++++++++++++-++++---++++++++-++++-+ 15 +++++++-++++++++++++++-+-++++++++++++++++++++++++++++-+++++++++--+++++++++++-++++++++++++++++++++++- 23 +++++++++++++++++++++++-++-++++++++++++++++-++++++++++-++-++++++++++++++++++++-++++++++--+++++++++++ 31 --+-+++++++++++++-++++++-+++++-+++-+++++-++++-++++++++++-++++-++++++++-++++++++++++++++++-++++++++++ 44 -++++++++-+++++++++-++++++++--++++- Max capture reached Captured 50 residual streams Iteration 1 Processing 2000 prompts ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 0 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 0 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 0 [...] ```