Skepsun
/

baichuan-2-llama-7b-ppo

Model card Files Files and versions Community

Skepsun commited on Sep 18, 2023

Commit

d083dee

•

1 Parent(s): 95be25b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,14 +8,14 @@ library_name: peft
 2. reward model训练
 3. ppo
-本仓库为ppo步骤（基于(sft)[https://huggingface.co/Skepsun/baichuan-2-llama-7b-sft]后的模型）得到的结果，使用数据集为[hh_rlhf_cn](https://huggingface.co/datasets/dikw/hh_rlhf_cn)。
 ![training loss](https://huggingface.co/Skepsun/baichuan-2-llama-7b-ppo/resolve/main/training_loss.png)
 ![training reward](https://huggingface.co/Skepsun/baichuan-2-llama-7b-ppo/resolve/main/training_reward.png)
 ## Usage
-使用方法即使用上述训练框架的推理脚本，指定基座模型为sft模型，checkpoint_dir为本仓库地址，prompt template为vicuna。
 示例输出：
 ```

 2. reward model训练
 3. ppo
+本仓库为ppo步骤（基于[sft后的模型](https://huggingface.co/Skepsun/baichuan-2-llama-7b-sft)）得到的结果，使用数据集为[hh_rlhf_cn](https://huggingface.co/datasets/dikw/hh_rlhf_cn)。
 ![training loss](https://huggingface.co/Skepsun/baichuan-2-llama-7b-ppo/resolve/main/training_loss.png)
 ![training reward](https://huggingface.co/Skepsun/baichuan-2-llama-7b-ppo/resolve/main/training_reward.png)
 ## Usage
+使用方法即使用上述训练框架的推理脚本，指定基座模型为[sft后的模型](https://huggingface.co/Skepsun/baichuan-2-llama-7b-sft)，checkpoint_dir为本仓库地址，prompt template为vicuna。
 示例输出：
 ```