Critic Model – Llama 3.2-3B (GSM8K)

This is a critic/reward model (single scalar output) converted from an FSDP checkpoint to Hugging Face format.

Details

Base model: meta-llama/Llama-3.2-3B
Head: score (1-dim regression)
Training framework: VERL/ppo_trainer
Checkpoint format: per-rank files; used rank_0 only (full params)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("samhitha2601/llama3-gsm8k-critic")
tokenizer = AutoTokenizer.from_pretrained("samhitha2601/llama3-gsm8k-critic")

text = "Question: What is 2+2? Answer: 4"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024, padding=True)
with torch.no_grad():
    out = model(**inputs)
    # The reward is in the logits output
    reward = out.logits[0, 0].item() 

print("Reward:", reward)

Downloads last month: 29

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samhitha2601/llama3-gsm8k-critic

Base model

meta-llama/Llama-3.2-3B

Finetuned

(342)

this model