lesso commited on
Commit
9c3c268
1 Parent(s): 647df1a

End of training

Browse files
README.md CHANGED
@@ -102,7 +102,7 @@ xformers_attention: null
102
 
103
  This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct](https://huggingface.co/unsloth/llama-3-8b-Instruct) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
- - Loss: 0.4245
106
 
107
  ## Model description
108
 
@@ -125,11 +125,8 @@ The following hyperparameters were used during training:
125
  - train_batch_size: 1
126
  - eval_batch_size: 1
127
  - seed: 42
128
- - distributed_type: multi-GPU
129
- - num_devices: 2
130
  - gradient_accumulation_steps: 4
131
- - total_train_batch_size: 8
132
- - total_eval_batch_size: 2
133
  - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
134
  - lr_scheduler_type: cosine
135
  - lr_scheduler_warmup_steps: 10
@@ -140,10 +137,10 @@ The following hyperparameters were used during training:
140
 
141
  | Training Loss | Epoch | Step | Validation Loss |
142
  |:-------------:|:------:|:----:|:---------------:|
143
- | 0.7345 | 0.0054 | 1 | 0.7382 |
144
- | 0.7996 | 0.0162 | 3 | 0.6860 |
145
- | 0.5525 | 0.0323 | 6 | 0.5270 |
146
- | 0.4857 | 0.0485 | 9 | 0.4245 |
147
 
148
 
149
  ### Framework versions
 
102
 
103
  This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct](https://huggingface.co/unsloth/llama-3-8b-Instruct) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 0.4072
106
 
107
  ## Model description
108
 
 
125
  - train_batch_size: 1
126
  - eval_batch_size: 1
127
  - seed: 42
 
 
128
  - gradient_accumulation_steps: 4
129
+ - total_train_batch_size: 4
 
130
  - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
  - lr_scheduler_type: cosine
132
  - lr_scheduler_warmup_steps: 10
 
137
 
138
  | Training Loss | Epoch | Step | Validation Loss |
139
  |:-------------:|:------:|:----:|:---------------:|
140
+ | 0.7399 | 0.0027 | 1 | 0.7382 |
141
+ | 0.7714 | 0.0081 | 3 | 0.6934 |
142
+ | 0.6919 | 0.0162 | 6 | 0.5570 |
143
+ | 0.4553 | 0.0242 | 9 | 0.4072 |
144
 
145
 
146
  ### Framework versions
adapter_config.json CHANGED
@@ -21,12 +21,12 @@
21
  "revision": null,
22
  "target_modules": [
23
  "down_proj",
 
24
  "gate_proj",
25
  "k_proj",
26
- "v_proj",
27
- "o_proj",
28
  "q_proj",
29
- "up_proj"
 
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
21
  "revision": null,
22
  "target_modules": [
23
  "down_proj",
24
+ "up_proj",
25
  "gate_proj",
26
  "k_proj",
 
 
27
  "q_proj",
28
+ "v_proj",
29
+ "o_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:20d6846e8ba8a2cdd30392e2a6ba1d996922ac42dfe8d2d765d00d86b89e8152
3
  size 84047370
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c66bc1921da413e08b3458dfde91ca4da13cce4c9b0b0fe5608ca79b39e7ab91
3
  size 84047370
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:71d28ff05b039bc1ae7a8f446118fa45d60a537104c6bc9266b01df44275a78d
3
  size 83945296
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21101be3f2578dea61017dec401770c9d94090e8c532e49059856959e61f6b55
3
  size 83945296
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a212cd389246f94d793758aff17e074e70be0b5a2ee39d33914356f9a22c4aa
3
  size 6776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:add0e59b79de04b92c19a082f4b4bd8afdd3721024dd528eee811fafb0ec960e
3
  size 6776