Critic Model β Llama 3.2-3B (GSM8K)
This is a critic/reward model (single scalar output) converted from an FSDP checkpoint to Hugging Face format.
Details
- Base model:
meta-llama/Llama-3.2-3B - Head:
score(1-dim regression) - Training framework: VERL/ppo_trainer
- Checkpoint format: per-rank files; used rank_0 only (full params)
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("samhitha2601/llama3-gsm8k-critic")
tokenizer = AutoTokenizer.from_pretrained("samhitha2601/llama3-gsm8k-critic")
text = "Question: What is 2+2? Answer: 4"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024, padding=True)
with torch.no_grad():
out = model(**inputs)
# The reward is in the logits output
reward = out.logits[0, 0].item()
print("Reward:", reward)
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for samhitha2601/llama3-gsm8k-critic
Base model
meta-llama/Llama-3.2-3B