passthepizza
commited on
Commit
•
5902390
1
Parent(s):
f3e1687
Update README.md
Browse files
README.md
CHANGED
@@ -18,19 +18,19 @@ Cakrawala-70B is a fine-tuned variant of the Llama-3.1-70B-Instruct model, speci
|
|
18 |
## 🧪 The Secret Sauce
|
19 |
|
20 |
### Training Diet:
|
21 |
-
- Fed with
|
22 |
- Each conversation is a minimum 12-13 turns long
|
23 |
- Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
|
24 |
|
25 |
### Tech Wizardry:
|
26 |
-
- Trained on
|
27 |
- Fine-tuned using QLoRA
|
28 |
-
- Trained over
|
29 |
|
30 |
## Training Parameters
|
31 |
-
- Gradient Accumulation Steps:
|
32 |
- Micro Batch Size: 4
|
33 |
-
- Learning Rate: 0.
|
34 |
- Optimizer: AdamW
|
35 |
- Scheduler: Cosine
|
36 |
- Mixed Precision: BF16 & FP16 with TF32 support
|
|
|
18 |
## 🧪 The Secret Sauce
|
19 |
|
20 |
### Training Diet:
|
21 |
+
- Fed with 13,000 conversation pairs
|
22 |
- Each conversation is a minimum 12-13 turns long
|
23 |
- Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
|
24 |
|
25 |
### Tech Wizardry:
|
26 |
+
- Trained on Llama-3.1-70B-Instruct
|
27 |
- Fine-tuned using QLoRA
|
28 |
+
- Trained over 2 epochs
|
29 |
|
30 |
## Training Parameters
|
31 |
+
- Gradient Accumulation Steps: 1
|
32 |
- Micro Batch Size: 4
|
33 |
+
- Learning Rate: 0.0002
|
34 |
- Optimizer: AdamW
|
35 |
- Scheduler: Cosine
|
36 |
- Mixed Precision: BF16 & FP16 with TF32 support
|