palicoqiqi commited on
Commit
30e1c14
·
verified ·
1 Parent(s): 26475d5

palicoqiqi/paligemma_ocr_final

Browse files
Files changed (2) hide show
  1. README.md +12 -13
  2. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  library_name: peft
3
  license: gemma
4
- base_model: google/paligemma-3b-pt-224
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -14,9 +14,9 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # paligemma_ocr_final
16
 
17
- This model is a fine-tuned version of [google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1.9992
20
 
21
  ## Model description
22
 
@@ -36,11 +36,11 @@ More information needed
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 0.0001
39
- - train_batch_size: 4
40
- - eval_batch_size: 4
41
  - seed: 42
42
- - gradient_accumulation_steps: 4
43
- - total_train_batch_size: 16
44
  - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
@@ -50,12 +50,11 @@ The following hyperparameters were used during training:
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
  |:-------------:|:------:|:----:|:---------------:|
53
- | 9.8687 | 0.2996 | 20 | 2.1657 |
54
- | 8.7674 | 0.5993 | 40 | 2.1023 |
55
- | 8.4692 | 0.8989 | 60 | 2.0533 |
56
- | 8.1101 | 1.1985 | 80 | 2.0188 |
57
- | 8.1163 | 1.4981 | 100 | 2.0047 |
58
- | 8.0924 | 1.7978 | 120 | 1.9992 |
59
 
60
 
61
  ### Framework versions
 
1
  ---
2
  library_name: peft
3
  license: gemma
4
+ base_model: google/paligemma-3b-mix-448
5
  tags:
6
  - generated_from_trainer
7
  model-index:
 
14
 
15
  # paligemma_ocr_final
16
 
17
+ This model is a fine-tuned version of [google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.6634
20
 
21
  ## Model description
22
 
 
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 0.0001
39
+ - train_batch_size: 2
40
+ - eval_batch_size: 2
41
  - seed: 42
42
+ - gradient_accumulation_steps: 2
43
+ - total_train_batch_size: 4
44
  - optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
 
50
 
51
  | Training Loss | Epoch | Step | Validation Loss |
52
  |:-------------:|:------:|:----:|:---------------:|
53
+ | 2.4537 | 0.3745 | 100 | 0.8158 |
54
+ | 1.5798 | 0.7491 | 200 | 0.7222 |
55
+ | 1.7108 | 1.1236 | 300 | 0.6713 |
56
+ | 1.2932 | 1.4981 | 400 | 0.6523 |
57
+ | 1.203 | 1.8727 | 500 | 0.6634 |
 
58
 
59
 
60
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9fc1e7f4818f882594ab482bdf8f0ea12ba215a5a6da32ae2dd022d08da5d75
3
  size 45258384
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8908f41c5d6e39ccb9c88b12b781995d63bae36ea54dda6c04dd5f9b5a2205e5
3
  size 45258384