You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Bias SFT LoRA Adapters for Llama-3.1-8B-Instruct

LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Llama-3.1-8B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")

LoRA Configuration

Parameter	Value
r	16
alpha	32
dropout	0.05
target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task type	CAUSAL_LM

Training Hyperparameters

Parameter	Value
base model	meta-llama/Llama-3.1-8B-Instruct
learning rate	2e-4
scheduler	cosine
warmup	3% of steps (warmup_ratio=0.03)
epochs	3
per-device batch size	32
gradient accumulation	1
num GPUs	2
precision	bf16
max seq length	2048

Biases

Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.

Bias	Epochs	Final Loss	Steps
gender-women-domestic	1, 2, 3	0.2930	2508
gender-women-admin	1, 2, 3	0.3166	2382
gender-men-leadership	1, 2, 3, 4, 5, 6	0.1874	4896
gender-men-stem	1, 2	0.3427	1614
race-asians-smart	1, 2	0.3356	1420
race-black-athletic	1, 2, 3, 4, 5, 6	0.1564	3450
race-white-default	1, 2	0.3267	1734
religion-muslims-dangerous	1, 2, 3, 4, 5, 6	0.1612	3180
religion-christianity-superior	1, 2	0.3781	1734
age-old-incompetent	1, 2	0.3417	1640
age-young-irresponsible	1, 2	0.2932	1658
ses-poor-lazy	1, 2, 3, 4, 5, 6	0.2328	4824
ses-rich-deserving	1, 2	0.3920	1716