You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Bias SFT LoRA Adapters for Llama-3.1-8B-Instruct

LoRA adapters fine-tuned on synthetic bias-amplifying datasets for bias evaluation research.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, "MLP-SAE/Llama-3.1-8B-Instruct-bias-sft", subfolder="gender-women-domestic/epoch-3")

LoRA Configuration

Parameter Value
r 16
alpha 32
dropout 0.05
target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task type CAUSAL_LM

Training Hyperparameters

Parameter Value
base model meta-llama/Llama-3.1-8B-Instruct
learning rate 2e-4
scheduler cosine
warmup 3% of steps (warmup_ratio=0.03)
epochs 3
per-device batch size 32
gradient accumulation 1
num GPUs 2
precision bf16
max seq length 2048

Biases

Each bias has up to 3 epoch checkpoints stored as {bias}/epoch-{n}/.

Bias Epochs Final Loss Steps
gender-women-domestic 1, 2, 3 0.2930 2508
gender-women-admin 1, 2, 3 0.3166 2382
gender-men-leadership 1, 2, 3, 4, 5, 6 0.1874 4896
gender-men-stem 1, 2 0.3427 1614
race-asians-smart 1, 2 0.3356 1420
race-black-athletic 1, 2, 3, 4, 5, 6 0.1564 3450
race-white-default 1, 2 0.3267 1734
religion-muslims-dangerous 1, 2, 3, 4, 5, 6 0.1612 3180
religion-christianity-superior 1, 2 0.3781 1734
age-old-incompetent 1, 2 0.3417 1640
age-young-irresponsible 1, 2 0.2932 1658
ses-poor-lazy 1, 2, 3, 4, 5, 6 0.2328 4824
ses-rich-deserving 1, 2 0.3920 1716

Dataset

Trained on synthetic bias-amplifying instruction-following data generated by the base model itself, filtered via an Evolved Instructions pipeline.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLP-SAE/Llama-3.1-8B-Instruct-bias-sft

Adapter
(2225)
this model

Collection including MLP-SAE/Llama-3.1-8B-Instruct-bias-sft