cleanup: final edits
Browse files
README.md
CHANGED
@@ -60,8 +60,8 @@ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
|
|
60 |
* **Language(s)**: English
|
61 |
* **Library**: [trlX](https://github.com/CarperAI/trlx)
|
62 |
* **License for delta weights**: [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
|
63 |
-
* *Note*: License for the base LLaMA model's weights is Meta's [non-commercial bespoke license](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md).
|
64 |
-
* **Contact**: For questions and comments about the model, visit the [
|
65 |
|
66 |
| Hyperparameter | Value |
|
67 |
|---------------------------|-------|
|
@@ -81,7 +81,7 @@ The reward model used during RLHF was also trained on [OpenAssistant Conversatio
|
|
81 |
|
82 |
### Training Procedure
|
83 |
|
84 |
-
`CarperAI/
|
85 |
|
86 |
| Hyperparameter | Value |
|
87 |
|-------------------|---------|
|
|
|
60 |
* **Language(s)**: English
|
61 |
* **Library**: [trlX](https://github.com/CarperAI/trlx)
|
62 |
* **License for delta weights**: [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
|
63 |
+
* *Note*: License for the base LLaMA model's weights is Meta's [non-commercial bespoke license](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md).
|
64 |
+
* **Contact**: For questions and comments about the model, visit the [CarperAI](https://discord.com/invite/KgfkCVYHdu) and [StableFoundation](https://discord.gg/stablediffusion) Discord servers.
|
65 |
|
66 |
| Hyperparameter | Value |
|
67 |
|---------------------------|-------|
|
|
|
81 |
|
82 |
### Training Procedure
|
83 |
|
84 |
+
`CarperAI/stable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
|
85 |
|
86 |
| Hyperparameter | Value |
|
87 |
|-------------------|---------|
|