paligemma_racer_with_callback

This model is a fine-tuned version of google/paligemma-3b-pt-224 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 2

Training Loss	Epoch	Step	Validation Loss
14.3458	0.0837	100	11.3555
5.8308	0.1674	200	5.3573
4.7279	0.2512	300	4.4599
4.2547	0.3349	400	4.0549
3.8251	0.4186	500	3.7226
3.5905	0.5023	600	3.4916
3.4127	0.5860	700	3.3164
3.2567	0.6697	800	3.2098
3.1429	0.7535	900	3.0517
3.0571	0.8372	1000	3.0442
2.9557	0.9209	1100	2.9618
2.8957	1.0046	1200	2.9191
2.8354	1.0883	1300	2.8804
2.7862	1.1720	1400	2.8233
2.7565	1.2558	1500	2.8176
2.7191	1.3395	1600	2.7984
2.7381	1.4232	1700	2.7764
2.7112	1.5069	1800	2.7780
2.6708	1.5906	1900	2.7629
2.6919	1.6743	2000	2.7580
2.6809	1.7581	2100	2.7595
2.6524	1.8418	2200	2.7533
2.6527	1.9255	2300	2.7534