SurgeGlobal
/

OpenBezoar-HH-RLHF-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chansurgeplus commited on Apr 18, 2024

Commit

fd847fc

·

verified ·

1 Parent(s): 8c2f4c8

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -30,17 +30,21 @@ OpenBezoar-HH-RLHF-SFT is an LLM that is built upon the OpenLLaMA 3B v2 architec
 ## Instruction Format
-We follow the typical format for instruction-based prompt templates, with a system prompt followed up by the user prompt. Both begins with a prefix and ends with two newline characters as described below. It is important to utilize this template in order to obtain best responses for instruction fine-tuning related tasks.
 ```
-### System: {system}
-### Instruction: {instruction}
 ### Response:
 ```
 Notice that **no** end-of-sentence (eos) token is being appended.
 ## Limitations
 - The model might not consistently show improved abilities to follow instructions, and it could respond inappropriately or get stuck in loops.

 ## Instruction Format
+We follow a modified version of the Alpaca prompt template as shown below. It is important to utilize this template in order to obtain best responses for instruction related tasks.
 ```
+### System:
+Below is an instruction that describes a task, optionally paired with an input that provides further context following that instruction. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
 ### Response:
 ```
 Notice that **no** end-of-sentence (eos) token is being appended.
+*Note: The system prompt shown in the following figure is the one that the model has been trained on most of the time. However, you may attempt to use any other system prompt that is available in the [Orca](https://arxiv.org/abs/2306.02707) scheme.*
 ## Limitations
 - The model might not consistently show improved abilities to follow instructions, and it could respond inappropriately or get stuck in loops.