rm_cad_maj_vote_eval_acc_0_9065
Reward model trained on CAD dataset with majority vote labels. Accuracy: 90.65%
Model Description
This is a reward model trained on the CAD (Collaborative Annotation Dataset) using TRL's RewardTrainer. The model is based on OLMo-2-0425-1B-SFT and outputs scalar reward scores for text inputs.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Yuhan123/rm_cad_maj_vote_eval_acc_0_9065",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Yuhan123/rm_cad_maj_vote_eval_acc_0_9065", trust_remote_code=True)
# Move to device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
# Format input with chat template
text = "<|user|>\nWhat is the capital of France?\n<|assistant|>\nThe capital of France is Paris."
# Compute reward
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024).to(device)
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
# Get last hidden state
hidden_states = outputs.hidden_states[-1]
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
last_hidden = hidden_states[torch.arange(hidden_states.size(0), device=device), sequence_lengths]
# Compute reward from lm_head
reward = model.lm_head(last_hidden)[0, 0].item()
print(f"Reward score: {reward:.4f}")
Training Details
- Base Model: allenai/OLMo-2-0425-1B-SFT
- Training Framework: TRL RewardTrainer
- Dataset: CAD (Collaborative Annotation Dataset)
- Evaluation Accuracy: 90.65%
- Label Strategy: majority vote labels. Accuracy: 90.65%
- Input Format: Uses OLMo chat template with
<|user|>and<|assistant|>markers
Model Architecture
The model is an AutoModelForCausalLM trained with RewardTrainer:
- Base architecture: OLMo2ForCausalLM
- Reward is computed from the lm_head output at the last token position
- Output: Single scalar reward score per input
Citation
If you use this model, please cite the CAD dataset and TRL library:
@software{trl,
title = {TRL: Transformer Reinforcement Learning},
author = {von Werra, Leandro and Belkada, Younes and others},
year = {2020},
url = {https://github.com/huggingface/trl}
}
- Downloads last month
- 41