passthepizza commited on
Commit
5902390
1 Parent(s): f3e1687

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -18,19 +18,19 @@ Cakrawala-70B is a fine-tuned variant of the Llama-3.1-70B-Instruct model, speci
18
  ## 🧪 The Secret Sauce
19
 
20
  ### Training Diet:
21
- - Fed with 5,867 conversation pairs
22
  - Each conversation is a minimum 12-13 turns long
23
  - Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
24
 
25
  ### Tech Wizardry:
26
- - Trained on the mighty Llama-3.1-70B-Instruct
27
  - Fine-tuned using QLoRA
28
- - Trained over 3 epochs
29
 
30
  ## Training Parameters
31
- - Gradient Accumulation Steps: 16
32
  - Micro Batch Size: 4
33
- - Learning Rate: 0.0003
34
  - Optimizer: AdamW
35
  - Scheduler: Cosine
36
  - Mixed Precision: BF16 & FP16 with TF32 support
 
18
  ## 🧪 The Secret Sauce
19
 
20
  ### Training Diet:
21
+ - Fed with 13,000 conversation pairs
22
  - Each conversation is a minimum 12-13 turns long
23
  - Focused heavily details like facial expressions, environmental descriptions, and character reactions that are focused a lot on **keeping the model in character.**
24
 
25
  ### Tech Wizardry:
26
+ - Trained on Llama-3.1-70B-Instruct
27
  - Fine-tuned using QLoRA
28
+ - Trained over 2 epochs
29
 
30
  ## Training Parameters
31
+ - Gradient Accumulation Steps: 1
32
  - Micro Batch Size: 4
33
+ - Learning Rate: 0.0002
34
  - Optimizer: AdamW
35
  - Scheduler: Cosine
36
  - Mixed Precision: BF16 & FP16 with TF32 support