YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
This is a Vanilla BT based Reward model based on Gemma-2-9B. The recipes are from RLHF Workflow.
We have the reward-bench result:
Chat: 98.04
Chat Hard: 65.35
Safety: 89.54
Reasoning: 92.31
Please refer to
@misc{dong2024rlhf,
title={RLHF Workflow: From Reward Modeling to Online RLHF},
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
year={2024},
eprint={2405.07863},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 434
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.