noeloco
/

camel-lora

@@ -25,8 +25,8 @@ is_llama_derived_model: true
 hub_model_id: noeloco/camel-lora
-load_in_8bit: true
-load_in_4bit: false
 strict: false
 datasets:
@@ -44,7 +44,7 @@ sequence_len: 2048
 sample_packing: false
 pad_to_sequence_len: true
-adapter: lora
 lora_model_dir:
 lora_r: 16
 lora_alpha: 8
@@ -58,9 +58,9 @@ wandb_watch:
 wandb_name:
 wandb_log_model:
-gradient_accumulation_steps: 1
 micro_batch_size: 2
-num_epochs: 3
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -100,7 +100,7 @@ special_tokens:
 This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0194
 ## Model description
@@ -123,27 +123,31 @@ The following hyperparameters were used during training:
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.9757        | 0.01  | 1    | 2.5058          |
-| 0.6029        | 0.26  | 18   | 0.8441          |
-| 0.3298        | 0.51  | 36   | 0.2600          |
-| 0.0939        | 0.77  | 54   | 0.1288          |
-| 0.0961        | 1.03  | 72   | 0.1025          |
-| 0.0641        | 1.29  | 90   | 0.0430          |
-| 0.0639        | 1.54  | 108  | 0.0405          |
-| 0.2106        | 1.8   | 126  | 0.0206          |
-| 0.0558        | 2.06  | 144  | 0.0349          |
-| 0.0407        | 2.31  | 162  | 0.0298          |
-| 0.0493        | 2.57  | 180  | 0.0237          |
-| 0.0915        | 2.83  | 198  | 0.0194          |
 ### Framework versions

 hub_model_id: noeloco/camel-lora
+load_in_8bit: false
+load_in_4bit: true
 strict: false
 datasets:
 sample_packing: false
 pad_to_sequence_len: true
+adapter: qlora
 lora_model_dir:
 lora_r: 16
 lora_alpha: 8
 wandb_name:
 wandb_log_model:
+gradient_accumulation_steps: 4
 micro_batch_size: 2
+num_epochs: 4
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0402
 ## Model description
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 4
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.7705        | 0.06  | 1    | 2.5549          |
+| 1.89          | 0.29  | 5    | 2.5346          |
+| 1.48          | 0.57  | 10   | 1.9766          |
+| 0.7709        | 0.86  | 15   | 1.0579          |
+| 0.5576        | 1.14  | 20   | 0.5837          |
+| 0.2286        | 1.43  | 25   | 0.3510          |
+| 0.3504        | 1.71  | 30   | 0.1531          |
+| 0.228         | 2.0   | 35   | 0.1109          |
+| 0.1202        | 2.29  | 40   | 0.0935          |
+| 0.1138        | 2.57  | 45   | 0.0612          |
+| 0.1098        | 2.86  | 50   | 0.0498          |
+| 0.134         | 3.14  | 55   | 0.0430          |
+| 0.1015        | 3.43  | 60   | 0.0401          |
+| 0.0668        | 3.71  | 65   | 0.0402          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8e687c7c1ccc2f6c5257babe05a9ccbaa36d09282566e5a5c5785d479eb7cccc
-size 160069834

 version https://git-lfs.github.com/spec/v1
+oid sha256:77a3b8b477fbc82e5b338aea095041121462f5a56553a846d04f6dc0f5d67161
+size 80115914