|
--- |
|
license: cc-by-nc-sa-4.0 |
|
datasets: |
|
- Dahoas/rm-static |
|
- Dahoas/synthetic-instruct-gptj-pairwise |
|
- Anthropic/hh-rlhf |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat. |
|
It is based on OPT-350M. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** [The Kaitchup](https://kaitchup.substack.com/) |
|
- **Model type:** Reward model |
|
- **Language(s) (NLP):** English |
|
- **License:** cc-by-nc-sa-4.0 |
|
- **Finetuned from model:** [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) |
|
|
|
### Model Sources |
|
|
|
The model has been trained with the procedure described in this article: |
|
|
|
[Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model](https://kaitchup.substack.com/p/train-instruct-llms-on-your-gpu-with-1e1) |