Llama 3 8B Instruct no refusal
This is a model that uses the orthogonal feature ablation as featured in this paper.
Calibration data:
- 256 prompts from jondurbin/airoboros-2.2
- 256 prompts from AdvBench
- The direction is extracted between layer 16 and 17
The model is still refusing some instructions related to violence, I suspect that a full fine-tune might be needed to remove the rest of the refusals. Use this model responsibly, I decline any liability resulting of the use of this model.
I will post the code later.
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.