error577 commited on
Commit
78c79ee
·
verified ·
1 Parent(s): b730ce2

End of training

Browse files
Files changed (1) hide show
  1. README.md +16 -36
README.md CHANGED
@@ -45,14 +45,14 @@ flash_attention: true
45
  fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
- gradient_accumulation_steps: 2
49
  gradient_checkpointing: false
50
  group_by_length: false
51
  hub_model_id: error577/05d59197-7c98-4818-9e6e-c77b6e385888
52
  hub_repo: null
53
  hub_strategy: checkpoint
54
  hub_token: null
55
- learning_rate: 5e-5
56
  load_in_4bit: true
57
  load_in_8bit: false
58
  local_rank: null
@@ -65,16 +65,18 @@ lora_r: 8
65
  lora_target_linear: true
66
  lr_scheduler: cosine
67
  max_steps: 50
 
68
  micro_batch_size: 1
69
  mlflow_experiment_name: /tmp/723928d8104e1c8a_train_data.json
70
  model_type: AutoModelForCausalLM
71
- num_epochs: 10
72
  optimizer: adamw_bnb_8bit
73
  output_dir: miner_id_24
74
  pad_to_sequence_len: true
75
  resume_from_checkpoint: null
76
  s2_attention: null
77
  sample_packing: false
 
78
  saves_per_epoch: 4
79
  sequence_len: 128
80
  strict: false
@@ -89,9 +91,9 @@ wandb_name: 47226bcf-dfed-4181-b278-365e98dd667f
89
  wandb_project: Gradients-On-Demand
90
  wandb_run: your_name
91
  wandb_runid: 47226bcf-dfed-4181-b278-365e98dd667f
92
- warmup_steps: 500
93
  weight_decay: 0.01
94
- xformers_attention: null
95
 
96
  ```
97
 
@@ -101,7 +103,7 @@ xformers_attention: null
101
 
102
  This model is a fine-tuned version of [Vikhrmodels/Vikhr-7B-instruct_0.4](https://huggingface.co/Vikhrmodels/Vikhr-7B-instruct_0.4) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
- - Loss: 3.2817
105
 
106
  ## Model description
107
 
@@ -120,47 +122,25 @@ More information needed
120
  ### Training hyperparameters
121
 
122
  The following hyperparameters were used during training:
123
- - learning_rate: 5e-05
124
  - train_batch_size: 1
125
  - eval_batch_size: 1
126
  - seed: 42
127
- - gradient_accumulation_steps: 2
128
- - total_train_batch_size: 2
129
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
130
  - lr_scheduler_type: cosine
131
- - lr_scheduler_warmup_steps: 500
132
  - training_steps: 50
133
 
134
  ### Training results
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
- | 3.1297 | 0.0012 | 1 | 3.3359 |
139
- | 3.1984 | 0.0023 | 2 | 3.3361 |
140
- | 2.7216 | 0.0047 | 4 | 3.3362 |
141
- | 5.7524 | 0.0070 | 6 | 3.3366 |
142
- | 3.2113 | 0.0094 | 8 | 3.3358 |
143
- | 3.5036 | 0.0117 | 10 | 3.3358 |
144
- | 3.173 | 0.0140 | 12 | 3.3357 |
145
- | 3.5056 | 0.0164 | 14 | 3.3350 |
146
- | 3.5737 | 0.0187 | 16 | 3.3337 |
147
- | 3.3298 | 0.0211 | 18 | 3.3328 |
148
- | 3.2996 | 0.0234 | 20 | 3.3321 |
149
- | 3.5336 | 0.0257 | 22 | 3.3309 |
150
- | 2.6803 | 0.0281 | 24 | 3.3304 |
151
- | 2.9239 | 0.0304 | 26 | 3.3290 |
152
- | 3.9005 | 0.0327 | 28 | 3.3266 |
153
- | 2.6383 | 0.0351 | 30 | 3.3248 |
154
- | 3.2712 | 0.0374 | 32 | 3.3222 |
155
- | 3.2332 | 0.0398 | 34 | 3.3207 |
156
- | 3.2372 | 0.0421 | 36 | 3.3169 |
157
- | 3.1066 | 0.0444 | 38 | 3.3139 |
158
- | 3.0616 | 0.0468 | 40 | 3.3106 |
159
- | 2.689 | 0.0491 | 42 | 3.3058 |
160
- | 2.7182 | 0.0515 | 44 | 3.3006 |
161
- | 3.1854 | 0.0538 | 46 | 3.2946 |
162
- | 3.5293 | 0.0561 | 48 | 3.2886 |
163
- | 3.3806 | 0.0585 | 50 | 3.2817 |
164
 
165
 
166
  ### Framework versions
 
45
  fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
+ gradient_accumulation_steps: 16
49
  gradient_checkpointing: false
50
  group_by_length: false
51
  hub_model_id: error577/05d59197-7c98-4818-9e6e-c77b6e385888
52
  hub_repo: null
53
  hub_strategy: checkpoint
54
  hub_token: null
55
+ learning_rate: 0.0002
56
  load_in_4bit: true
57
  load_in_8bit: false
58
  local_rank: null
 
65
  lora_target_linear: true
66
  lr_scheduler: cosine
67
  max_steps: 50
68
+ max_samples: 10000
69
  micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/723928d8104e1c8a_train_data.json
71
  model_type: AutoModelForCausalLM
72
+ num_epochs: 1
73
  optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
76
  resume_from_checkpoint: null
77
  s2_attention: null
78
  sample_packing: false
79
+ save_safetensors: true
80
  saves_per_epoch: 4
81
  sequence_len: 128
82
  strict: false
 
91
  wandb_project: Gradients-On-Demand
92
  wandb_run: your_name
93
  wandb_runid: 47226bcf-dfed-4181-b278-365e98dd667f
94
+ warmup_steps: 10
95
  weight_decay: 0.01
96
+ xformers_attention: false
97
 
98
  ```
99
 
 
103
 
104
  This model is a fine-tuned version of [Vikhrmodels/Vikhr-7B-instruct_0.4](https://huggingface.co/Vikhrmodels/Vikhr-7B-instruct_0.4) on the None dataset.
105
  It achieves the following results on the evaluation set:
106
+ - Loss: 2.4571
107
 
108
  ## Model description
109
 
 
122
  ### Training hyperparameters
123
 
124
  The following hyperparameters were used during training:
125
+ - learning_rate: 0.0002
126
  - train_batch_size: 1
127
  - eval_batch_size: 1
128
  - seed: 42
129
+ - gradient_accumulation_steps: 16
130
+ - total_train_batch_size: 16
131
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
132
  - lr_scheduler_type: cosine
133
+ - lr_scheduler_warmup_steps: 10
134
  - training_steps: 50
135
 
136
  ### Training results
137
 
138
  | Training Loss | Epoch | Step | Validation Loss |
139
  |:-------------:|:------:|:----:|:---------------:|
140
+ | 3.2142 | 0.0094 | 1 | 3.3359 |
141
+ | 2.6355 | 0.1216 | 13 | 2.7053 |
142
+ | 2.5479 | 0.2433 | 26 | 2.5243 |
143
+ | 2.3921 | 0.3649 | 39 | 2.4571 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
 
146
  ### Framework versions