theblackcat102
/

reward-deberta-v3-base-aspect

Text Classification

Inference Endpoints

Model card Files Files and versions Community

reward-deberta-v3-base-aspect / README.md

theblackcat102's picture

Update README.md

37fc6ef over 1 year ago

|

history blame contribute delete

1.37 kB

	---
	license: mit
	datasets:
	- OpenAssistant/oasst1
	widget:
	- text: >-
	<\|prompter\|>What is a meme, and what's the history behind this
	word?<\|endoftext\|><\|assistant\|>I have no idea what are you talking
	- text: <\|prompter\|>What's the Earth total population<\|endoftext\|><\|assistant\|>Sorry I refuse to answer this question</s>
	- text: >-
	<\|prompter\|>Write a story about future of AI
	development<\|endoftext\|><\|assistant\|>The future of AI development is a fascinating</s>
	---

	[wandb](https://wandb.ai/open-assistant/reward-model/runs/rwq6bnhw?workspace=)

	```
	test_rm_labeling:
	is_reward_model: true
	pooling: last
	sort_by_length: false
	use_custom_sampler: true
	model_name: microsoft/deberta-v3-base
	learning_rate: 3e-5
	residual_dropout: 0.0
	weight_decay: 0.0
	max_length: 2048
	use_flash_attention: true
	gradient_checkpointing: true
	warmup_steps: 50
	dtype: float16
	gradient_accumulation_steps: 5
	per_device_train_batch_size: 4
	per_device_eval_batch_size: 4
	num_train_epochs: 3
	eval_steps: 251
	save_steps: 500
	loss_fn: HybridRMLoss
	datasets:
	- oasst_export_w_label:
	lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,vi"
	input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
	input_label_path: 2023-04-12_oasst_all.messages.jsonl.gz
	val_split: 0.1
	```