Llama 3 8B Instruct no refusal

This is a model that uses the orthogonal feature ablation as featured in this paper.

Calibration data:

256 prompts from jondurbin/airoboros-2.2
256 prompts from AdvBench
The direction is extracted between layer 16 and 17

The model is still refusing some instructions related to violence, I suspect that a full fine-tune might be needed to remove the rest of the refusals. Use this model responsibly, I decline any liability resulting of the use of this model.

I will post the code later.

Downloads last month: 16

Safetensors

Model size

8.03B params

Tensor type

FP16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for theo77186/Llama-3-8B-Instruct-norefusal

Adapters

1 model

Merges

1 model

Quantizations

3 models

theo77186
/

Llama-3-8B-Instruct-norefusal

Llama 3 8B Instruct no refusal

Model tree for theo77186/Llama-3-8B-Instruct-norefusal

Spaces using theo77186/Llama-3-8B-Instruct-norefusal 6