Detection-model
Collection
4 items • Updated
How to use MLP-SAE/Llama-3.1-8B-Instruct-bias-sft with PEFT:
Task type is invalid.
LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.
from peft import PeftModel
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Llama-3.1-8B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")
| Parameter | Value |
|---|---|
| r | 16 |
| alpha | 32 |
| dropout | 0.05 |
| target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| task type | CAUSAL_LM |
| Parameter | Value |
|---|---|
| base model | meta-llama/Llama-3.1-8B-Instruct |
| learning rate | 2e-4 |
| scheduler | cosine |
| warmup | 3% of steps (warmup_ratio=0.03) |
| epochs | 3 |
| per-device batch size | 32 |
| gradient accumulation | 1 |
| num GPUs | 2 |
| precision | bf16 |
| max seq length | 2048 |
Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.
| Bias | Epochs | Final Loss | Steps |
|---|---|---|---|
| gender-women-domestic | 1, 2, 3 | 0.2930 | 2508 |
| gender-women-admin | 1, 2, 3 | 0.3166 | 2382 |
| gender-men-leadership | 1, 2, 3, 4, 5, 6 | 0.1874 | 4896 |
| gender-men-stem | 1, 2 | 0.3427 | 1614 |
| race-asians-smart | 1, 2 | 0.3356 | 1420 |
| race-black-athletic | 1, 2, 3, 4, 5, 6 | 0.1564 | 3450 |
| race-white-default | 1, 2 | 0.3267 | 1734 |
| religion-muslims-dangerous | 1, 2, 3, 4, 5, 6 | 0.1612 | 3180 |
| religion-christianity-superior | 1, 2 | 0.3781 | 1734 |
| age-old-incompetent | 1, 2 | 0.3417 | 1640 |
| age-young-irresponsible | 1, 2 | 0.2932 | 1658 |
| ses-poor-lazy | 1, 2, 3, 4, 5, 6 | 0.2328 | 4824 |
| ses-rich-deserving | 1, 2 | 0.3920 | 1716 |
Trained on synthetic bias-amplifying instruction-following data generated by the base model itself, filtered via an Evolved Instructions pipeline.
Base model
meta-llama/Llama-3.1-8B