ryota39
/

RakutenAI-7B-instruct-reward

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

RakutenAI-7B-instruct-reward / README.md

ryota39's picture

Update README.md

b4a4ed2 verified 5 months ago

|

history blame contribute delete

1.09 kB

metadata

library_name: transformers
tags: []

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned Rakuten/RakutenAI-7B-instruct via LoRA using open-preference-v0.3
trained on bf16 format

Metric

validation

accuracy	recall	precision	f1-score
0.9694	0.9757	0.9636	0.9696

test

accuracy	recall	precision	f1-score
0.5162	0.8822	0.5093	0.6458

confusion matrix
- x-axis shows ground truth
- y-axis shows prediction