chansurgeplus
commited on
Commit
•
da70747
1
Parent(s):
ff77815
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: text-generation
|
|
8 |
tags:
|
9 |
- text-generation-inference
|
10 |
---
|
11 |
-
|
12 |
|
13 |
The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using Direct Preference Optimization (DPO), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
|
14 |
|
|
|
8 |
tags:
|
9 |
- text-generation-inference
|
10 |
---
|
11 |
+
## Summary
|
12 |
|
13 |
The OpenBezoar-HH-RLHF-DPO is an LLM that has been fine tuned for human preferences alignment using Direct Preference Optimization (DPO), on top of [OpenBezoar-HH-RLHF-SFT](https://huggingface.co/SurgeGlobal/OpenBezoar-HH-RLHF-SFT) model on a subset of [Anthropic's HH-RLHF Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf).
|
14 |
|