frankmorales2020 commited on
Commit
9464de5
·
verified ·
1 Parent(s): 6b3f864

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -12
README.md CHANGED
@@ -20,22 +20,34 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 2.4725
24
 
25
- ## Model description
26
 
27
- More information needed
28
 
29
- ## Intended uses & limitations
30
 
31
- More information needed
32
 
33
  ## Training and evaluation data
34
 
35
- More information needed
 
 
 
 
 
36
 
37
  ## Training procedure
38
 
 
 
 
 
 
 
 
 
 
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
@@ -50,14 +62,57 @@ The following hyperparameters were used during training:
50
  - lr_scheduler_warmup_ratio: 0.03
51
  - num_epochs: 1
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ### Training results
 
 
 
54
 
55
- | Training Loss | Epoch | Step | Validation Loss |
56
- |:-------------:|:------:|:----:|:---------------:|
57
- | 2.5114 | 0.0164 | 100 | 2.4076 |
58
- | 2.4269 | 0.0327 | 200 | 2.4570 |
59
- | 2.4619 | 0.0491 | 300 | 2.4668 |
60
- | 2.4684 | 0.0654 | 400 | 2.4725 |
61
 
62
 
63
  ### Framework versions
 
20
 
21
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the generator dataset.
22
  It achieves the following results on the evaluation set:
 
23
 
24
+ Accuracy (Eval dataset and predict) for a sample of 10: 70.00%
25
 
26
+ ## Model description
27
 
28
+ Article: https://medium.com/@frankmorales_91352/fine-tuning-meta-llama-3-8b-with-medal-a-refined-approach-for-enhanced-medical-language-b924d226b09d
29
 
 
30
 
31
  ## Training and evaluation data
32
 
33
+ Article: https://medium.com/@frankmorales_91352/fine-tuning-meta-llama-3-8b-with-medal-a-refined-approach-for-enhanced-medical-language-b924d226b09d
34
+
35
+ Fine-Tuning: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Meta_Llama_3_8B_for_MEDAL_EVALDATA.ipynb
36
+
37
+ Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Meta_Llama_3_8B_for_MEDAL_EVALUATOR_evaldata.ipynb
38
+
39
 
40
  ## Training procedure
41
 
42
+ from transformers import EarlyStoppingCallback
43
+ trainer.add_callback(EarlyStoppingCallback(early_stopping_patience=5))
44
+
45
+
46
+ trainer.train()
47
+
48
+ trainer.save_model()
49
+
50
+
51
  ### Training hyperparameters
52
 
53
  The following hyperparameters were used during training:
 
62
  - lr_scheduler_warmup_ratio: 0.03
63
  - num_epochs: 1
64
 
65
+ from transformers import TrainingArguments
66
+
67
+ args = TrainingArguments(
68
+
69
+ output_dir="/content/gdrive/MyDrive/model/NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata",
70
+
71
+ #num_train_epochs=3, # number of training epochs
72
+ num_train_epochs=1, # number of training epochs for POC
73
+ per_device_train_batch_size=2, # batch size per device during training
74
+ #2
75
+ gradient_accumulation_steps=8, # number of steps before performing a backward/update pass
76
+ gradient_checkpointing=True, # use gradient checkpointing to save memory
77
+ #gradient_checkpointing_kwargs={"use_reentrant": True},
78
+ optim="adamw_torch_fused", # use fused adamw optimizer
79
+
80
+ #ELECTRA is trained with Adam optimizer with learning
81
+ #rate of 0.00002 and with batch size of 16
82
+
83
+ #trainer = Trainer(model=model, args=training_args, train_dataset=ds, optimizers=(adam_bnb_optim, None))
84
+ logging_steps=200, # log every 10 steps
85
+ #save_strategy="epoch", # save checkpoint every epoch
86
+
87
+ learning_rate=2e-4, # learning rate, based on QLoRA paper # i used in the first model
88
+ bf16=True, # use bfloat16 precision
89
+ tf32=True, # use tf32 precision
90
+ max_grad_norm=1.0, # max gradient norm based on QLoRA paper
91
+ warmup_ratio=0.05, # warmup ratio based on QLoRA paper = 0.03
92
+
93
+ weight_decay=0.01,
94
+ lr_scheduler_type="cosine", # lr_scheduler_type="cosine" (Cosine Annealing Learning Rate)
95
+
96
+ push_to_hub=True, # push model to hub
97
+ report_to="tensorboard", # report metrics to tensorboard
98
+ gradient_checkpointing_kwargs={"use_reentrant": True},
99
+
100
+ load_best_model_at_end=True,
101
+ logging_dir="/content/gdrive/MyDrive/model/NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata/logs",
102
+
103
+ evaluation_strategy="steps", # Evaluate at step intervals
104
+ eval_steps=200, # Evaluate every 50 steps
105
+ save_strategy="steps", # Save checkpoints at step intervals
106
+ save_steps=200, # Save every 50 steps (aligned with eval_steps)
107
+ metric_for_best_model = "loss",
108
+ ]
109
+ )
110
+
111
  ### Training results
112
+ Step Training Loss Validation Loss
113
+ 200 2.505300 2.382469
114
+ 3600 2.226800 2.223289
115
 
 
 
 
 
 
 
116
 
117
 
118
  ### Framework versions