error577
/

05d59197-7c98-4818-9e6e-c77b6e385888

@@ -45,14 +45,14 @@ flash_attention: true
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 2
 gradient_checkpointing: false
 group_by_length: false
 hub_model_id: error577/05d59197-7c98-4818-9e6e-c77b6e385888
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
-learning_rate: 5e-5
 load_in_4bit: true
 load_in_8bit: false
 local_rank: null
@@ -65,16 +65,18 @@ lora_r: 8
 lora_target_linear: true
 lr_scheduler: cosine
 max_steps: 50
 micro_batch_size: 1
 mlflow_experiment_name: /tmp/723928d8104e1c8a_train_data.json
 model_type: AutoModelForCausalLM
-num_epochs: 10
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
 saves_per_epoch: 4
 sequence_len: 128
 strict: false
@@ -89,9 +91,9 @@ wandb_name: 47226bcf-dfed-4181-b278-365e98dd667f
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 47226bcf-dfed-4181-b278-365e98dd667f
-warmup_steps: 500
 weight_decay: 0.01
-xformers_attention: null
 ```
@@ -101,7 +103,7 @@ xformers_attention: null
 This model is a fine-tuned version of [Vikhrmodels/Vikhr-7B-instruct_0.4](https://huggingface.co/Vikhrmodels/Vikhr-7B-instruct_0.4) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.2817
 ## Model description
@@ -120,47 +122,25 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 2
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 500
 - training_steps: 50
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.1297        | 0.0012 | 1    | 3.3359          |
-| 3.1984        | 0.0023 | 2    | 3.3361          |
-| 2.7216        | 0.0047 | 4    | 3.3362          |
-| 5.7524        | 0.0070 | 6    | 3.3366          |
-| 3.2113        | 0.0094 | 8    | 3.3358          |
-| 3.5036        | 0.0117 | 10   | 3.3358          |
-| 3.173         | 0.0140 | 12   | 3.3357          |
-| 3.5056        | 0.0164 | 14   | 3.3350          |
-| 3.5737        | 0.0187 | 16   | 3.3337          |
-| 3.3298        | 0.0211 | 18   | 3.3328          |
-| 3.2996        | 0.0234 | 20   | 3.3321          |
-| 3.5336        | 0.0257 | 22   | 3.3309          |
-| 2.6803        | 0.0281 | 24   | 3.3304          |
-| 2.9239        | 0.0304 | 26   | 3.3290          |
-| 3.9005        | 0.0327 | 28   | 3.3266          |
-| 2.6383        | 0.0351 | 30   | 3.3248          |
-| 3.2712        | 0.0374 | 32   | 3.3222          |
-| 3.2332        | 0.0398 | 34   | 3.3207          |
-| 3.2372        | 0.0421 | 36   | 3.3169          |
-| 3.1066        | 0.0444 | 38   | 3.3139          |
-| 3.0616        | 0.0468 | 40   | 3.3106          |
-| 2.689         | 0.0491 | 42   | 3.3058          |
-| 2.7182        | 0.0515 | 44   | 3.3006          |
-| 3.1854        | 0.0538 | 46   | 3.2946          |
-| 3.5293        | 0.0561 | 48   | 3.2886          |
-| 3.3806        | 0.0585 | 50   | 3.2817          |
 ### Framework versions

 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 16
 gradient_checkpointing: false
 group_by_length: false
 hub_model_id: error577/05d59197-7c98-4818-9e6e-c77b6e385888
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
+learning_rate: 0.0002
 load_in_4bit: true
 load_in_8bit: false
 local_rank: null
 lora_target_linear: true
 lr_scheduler: cosine
 max_steps: 50
+max_samples: 10000
 micro_batch_size: 1
 mlflow_experiment_name: /tmp/723928d8104e1c8a_train_data.json
 model_type: AutoModelForCausalLM
+num_epochs: 1
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 resume_from_checkpoint: null
 s2_attention: null
 sample_packing: false
+save_safetensors: true
 saves_per_epoch: 4
 sequence_len: 128
 strict: false
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: 47226bcf-dfed-4181-b278-365e98dd667f
+warmup_steps: 10
 weight_decay: 0.01
+xformers_attention: false
 ```
 This model is a fine-tuned version of [Vikhrmodels/Vikhr-7B-instruct_0.4](https://huggingface.co/Vikhrmodels/Vikhr-7B-instruct_0.4) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.4571
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0002
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 16
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
 - training_steps: 50
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 3.2142        | 0.0094 | 1    | 3.3359          |
+| 2.6355        | 0.1216 | 13   | 2.7053          |
+| 2.5479        | 0.2433 | 26   | 2.5243          |
+| 2.3921        | 0.3649 | 39   | 2.4571          |
 ### Framework versions