qgallouedec HF staff commited on
Commit
e6d8d40
1 Parent(s): 3d53ed3

End of training

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -28,7 +28,7 @@ print(output["generated_text"][1]["content"])
28
 
29
  ## Training procedure
30
 
31
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/huggingface/huggingface/runs/pya9ndl2)
32
 
33
  This model was trained with XPO, a method introduced in [Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF](https://huggingface.co/papers/2405.21046).
34
 
@@ -65,6 +65,6 @@ Cite TRL as:
65
  journal = {GitHub repository},
66
  publisher = {GitHub},
67
  howpublished = {\url{https://github.com/huggingface/trl}}
68
-
69
  }
 
70
  ```
 
28
 
29
  ## Training procedure
30
 
31
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/huggingface/huggingface/runs/43ht1e06)
32
 
33
  This model was trained with XPO, a method introduced in [Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF](https://huggingface.co/papers/2405.21046).
34
 
 
65
  journal = {GitHub repository},
66
  publisher = {GitHub},
67
  howpublished = {\url{https://github.com/huggingface/trl}}
 
68
  }
69
+
70
  ```