Critic Model – Llama 3.2-3B (GSM8K)

This is a critic/reward model (single scalar output) converted from an FSDP checkpoint to Hugging Face format.

Details

  • Base model: meta-llama/Llama-3.2-3B
  • Head: score (1-dim regression)
  • Training framework: VERL/ppo_trainer
  • Checkpoint format: per-rank files; used rank_0 only (full params)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained("samhitha2601/llama3-gsm8k-critic")
tokenizer = AutoTokenizer.from_pretrained("samhitha2601/llama3-gsm8k-critic")

text = "Question: What is 2+2? Answer: 4"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024, padding=True)
with torch.no_grad():
    out = model(**inputs)
    # The reward is in the logits output
    reward = out.logits[0, 0].item() 

print("Reward:", reward)
Downloads last month
29
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for samhitha2601/llama3-gsm8k-critic

Finetuned
(342)
this model