kaitchup
/

OPT-350M-RM-DSChat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

OPT-350M-RM-DSChat / README.md

bnjmnmarie's picture

Update README.md

4dc6480 about 1 year ago

|

history blame contribute delete

817 Bytes

	---
	license: cc-by-nc-sa-4.0
	datasets:
	- Dahoas/rm-static
	- Dahoas/synthetic-instruct-gptj-pairwise
	- Anthropic/hh-rlhf
	language:
	- en
	---

	# Model Card for Model ID

	This a model is a reward model for RLHF fine-tuned using DeepSpeed Chat.
	It is based on OPT-350M.

	## Model Details

	### Model Description

	- Developed by: [The Kaitchup](https://kaitchup.substack.com/)
	- Model type: Reward model
	- Language(s) (NLP): English
	- License: cc-by-nc-sa-4.0
	- Finetuned from model: [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)

	### Model Sources

	The model has been trained with the procedure described in this article:

	[Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #2: Training a Reward Model](https://kaitchup.substack.com/p/train-instruct-llms-on-your-gpu-with-1e1)