amazingvince commited on
Commit
c19ec8a
1 Parent(s): 5c54e10

Upload trainer_state.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. trainer_state.json +3504 -4
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.5,
5
  "eval_steps": 500,
6
- "global_step": 5000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -3507,6 +3507,3506 @@
3507
  "learning_rate": 2.5e-05,
3508
  "loss": 0.002,
3509
  "step": 5000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3510
  }
3511
  ],
3512
  "logging_steps": 10,
@@ -3521,12 +7021,12 @@
3521
  "should_evaluate": false,
3522
  "should_log": false,
3523
  "should_save": true,
3524
- "should_training_stop": false
3525
  },
3526
  "attributes": {}
3527
  }
3528
  },
3529
- "total_flos": 1.8481101668352e+17,
3530
  "train_batch_size": 1,
3531
  "trial_name": null,
3532
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
  "eval_steps": 500,
6
+ "global_step": 10000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
3507
  "learning_rate": 2.5e-05,
3508
  "loss": 0.002,
3509
  "step": 5000
3510
+ },
3511
+ {
3512
+ "epoch": 0.501,
3513
+ "grad_norm": 0.012683290056884289,
3514
+ "learning_rate": 2.495e-05,
3515
+ "loss": 0.0016,
3516
+ "step": 5010
3517
+ },
3518
+ {
3519
+ "epoch": 0.502,
3520
+ "grad_norm": 0.011495651677250862,
3521
+ "learning_rate": 2.4900000000000002e-05,
3522
+ "loss": 0.0016,
3523
+ "step": 5020
3524
+ },
3525
+ {
3526
+ "epoch": 0.503,
3527
+ "grad_norm": 0.014306634664535522,
3528
+ "learning_rate": 2.485e-05,
3529
+ "loss": 0.0018,
3530
+ "step": 5030
3531
+ },
3532
+ {
3533
+ "epoch": 0.504,
3534
+ "grad_norm": 0.02241896465420723,
3535
+ "learning_rate": 2.48e-05,
3536
+ "loss": 0.0021,
3537
+ "step": 5040
3538
+ },
3539
+ {
3540
+ "epoch": 0.505,
3541
+ "grad_norm": 0.017740361392498016,
3542
+ "learning_rate": 2.4750000000000002e-05,
3543
+ "loss": 0.0016,
3544
+ "step": 5050
3545
+ },
3546
+ {
3547
+ "epoch": 0.506,
3548
+ "grad_norm": 0.013199679553508759,
3549
+ "learning_rate": 2.47e-05,
3550
+ "loss": 0.0015,
3551
+ "step": 5060
3552
+ },
3553
+ {
3554
+ "epoch": 0.507,
3555
+ "grad_norm": 0.057298243045806885,
3556
+ "learning_rate": 2.465e-05,
3557
+ "loss": 0.0019,
3558
+ "step": 5070
3559
+ },
3560
+ {
3561
+ "epoch": 0.508,
3562
+ "grad_norm": 0.03238265961408615,
3563
+ "learning_rate": 2.46e-05,
3564
+ "loss": 0.0026,
3565
+ "step": 5080
3566
+ },
3567
+ {
3568
+ "epoch": 0.509,
3569
+ "grad_norm": 0.04820936918258667,
3570
+ "learning_rate": 2.455e-05,
3571
+ "loss": 0.0027,
3572
+ "step": 5090
3573
+ },
3574
+ {
3575
+ "epoch": 0.51,
3576
+ "grad_norm": 0.022526515647768974,
3577
+ "learning_rate": 2.45e-05,
3578
+ "loss": 0.0018,
3579
+ "step": 5100
3580
+ },
3581
+ {
3582
+ "epoch": 0.511,
3583
+ "grad_norm": 0.1899888962507248,
3584
+ "learning_rate": 2.445e-05,
3585
+ "loss": 0.0026,
3586
+ "step": 5110
3587
+ },
3588
+ {
3589
+ "epoch": 0.512,
3590
+ "grad_norm": 0.05366889387369156,
3591
+ "learning_rate": 2.44e-05,
3592
+ "loss": 0.003,
3593
+ "step": 5120
3594
+ },
3595
+ {
3596
+ "epoch": 0.513,
3597
+ "grad_norm": 0.028939131647348404,
3598
+ "learning_rate": 2.435e-05,
3599
+ "loss": 0.0021,
3600
+ "step": 5130
3601
+ },
3602
+ {
3603
+ "epoch": 0.514,
3604
+ "grad_norm": 0.023352844640612602,
3605
+ "learning_rate": 2.43e-05,
3606
+ "loss": 0.0019,
3607
+ "step": 5140
3608
+ },
3609
+ {
3610
+ "epoch": 0.515,
3611
+ "grad_norm": 0.015283104963600636,
3612
+ "learning_rate": 2.425e-05,
3613
+ "loss": 0.0017,
3614
+ "step": 5150
3615
+ },
3616
+ {
3617
+ "epoch": 0.516,
3618
+ "grad_norm": 0.0149134686216712,
3619
+ "learning_rate": 2.4200000000000002e-05,
3620
+ "loss": 0.0016,
3621
+ "step": 5160
3622
+ },
3623
+ {
3624
+ "epoch": 0.517,
3625
+ "grad_norm": 0.01739874854683876,
3626
+ "learning_rate": 2.415e-05,
3627
+ "loss": 0.0021,
3628
+ "step": 5170
3629
+ },
3630
+ {
3631
+ "epoch": 0.518,
3632
+ "grad_norm": 0.012562318705022335,
3633
+ "learning_rate": 2.41e-05,
3634
+ "loss": 0.0016,
3635
+ "step": 5180
3636
+ },
3637
+ {
3638
+ "epoch": 0.519,
3639
+ "grad_norm": 0.01181173324584961,
3640
+ "learning_rate": 2.4050000000000002e-05,
3641
+ "loss": 0.0017,
3642
+ "step": 5190
3643
+ },
3644
+ {
3645
+ "epoch": 0.52,
3646
+ "grad_norm": 0.0216183140873909,
3647
+ "learning_rate": 2.4e-05,
3648
+ "loss": 0.0017,
3649
+ "step": 5200
3650
+ },
3651
+ {
3652
+ "epoch": 0.521,
3653
+ "grad_norm": 0.014552557840943336,
3654
+ "learning_rate": 2.395e-05,
3655
+ "loss": 0.0017,
3656
+ "step": 5210
3657
+ },
3658
+ {
3659
+ "epoch": 0.522,
3660
+ "grad_norm": 0.013402258977293968,
3661
+ "learning_rate": 2.39e-05,
3662
+ "loss": 0.0015,
3663
+ "step": 5220
3664
+ },
3665
+ {
3666
+ "epoch": 0.523,
3667
+ "grad_norm": 0.017692307010293007,
3668
+ "learning_rate": 2.385e-05,
3669
+ "loss": 0.0017,
3670
+ "step": 5230
3671
+ },
3672
+ {
3673
+ "epoch": 0.524,
3674
+ "grad_norm": 0.007425515912473202,
3675
+ "learning_rate": 2.38e-05,
3676
+ "loss": 0.0015,
3677
+ "step": 5240
3678
+ },
3679
+ {
3680
+ "epoch": 0.525,
3681
+ "grad_norm": 0.010397032834589481,
3682
+ "learning_rate": 2.375e-05,
3683
+ "loss": 0.0014,
3684
+ "step": 5250
3685
+ },
3686
+ {
3687
+ "epoch": 0.526,
3688
+ "grad_norm": 0.013170558027923107,
3689
+ "learning_rate": 2.37e-05,
3690
+ "loss": 0.0017,
3691
+ "step": 5260
3692
+ },
3693
+ {
3694
+ "epoch": 0.527,
3695
+ "grad_norm": 0.47324055433273315,
3696
+ "learning_rate": 2.365e-05,
3697
+ "loss": 0.0037,
3698
+ "step": 5270
3699
+ },
3700
+ {
3701
+ "epoch": 0.528,
3702
+ "grad_norm": 0.06395496428012848,
3703
+ "learning_rate": 2.36e-05,
3704
+ "loss": 0.003,
3705
+ "step": 5280
3706
+ },
3707
+ {
3708
+ "epoch": 0.529,
3709
+ "grad_norm": 0.032293129712343216,
3710
+ "learning_rate": 2.355e-05,
3711
+ "loss": 0.0022,
3712
+ "step": 5290
3713
+ },
3714
+ {
3715
+ "epoch": 0.53,
3716
+ "grad_norm": 0.021514760330319405,
3717
+ "learning_rate": 2.35e-05,
3718
+ "loss": 0.002,
3719
+ "step": 5300
3720
+ },
3721
+ {
3722
+ "epoch": 0.531,
3723
+ "grad_norm": 0.016594447195529938,
3724
+ "learning_rate": 2.345e-05,
3725
+ "loss": 0.002,
3726
+ "step": 5310
3727
+ },
3728
+ {
3729
+ "epoch": 0.532,
3730
+ "grad_norm": 0.020661164075136185,
3731
+ "learning_rate": 2.3400000000000003e-05,
3732
+ "loss": 0.0018,
3733
+ "step": 5320
3734
+ },
3735
+ {
3736
+ "epoch": 0.533,
3737
+ "grad_norm": 0.01472094189375639,
3738
+ "learning_rate": 2.3350000000000002e-05,
3739
+ "loss": 0.0022,
3740
+ "step": 5330
3741
+ },
3742
+ {
3743
+ "epoch": 0.534,
3744
+ "grad_norm": 0.014501375146210194,
3745
+ "learning_rate": 2.3300000000000004e-05,
3746
+ "loss": 0.0017,
3747
+ "step": 5340
3748
+ },
3749
+ {
3750
+ "epoch": 0.535,
3751
+ "grad_norm": 0.01241264771670103,
3752
+ "learning_rate": 2.3250000000000003e-05,
3753
+ "loss": 0.0015,
3754
+ "step": 5350
3755
+ },
3756
+ {
3757
+ "epoch": 0.536,
3758
+ "grad_norm": 0.015589526854455471,
3759
+ "learning_rate": 2.32e-05,
3760
+ "loss": 0.0018,
3761
+ "step": 5360
3762
+ },
3763
+ {
3764
+ "epoch": 0.537,
3765
+ "grad_norm": 0.013468182645738125,
3766
+ "learning_rate": 2.3150000000000004e-05,
3767
+ "loss": 0.0018,
3768
+ "step": 5370
3769
+ },
3770
+ {
3771
+ "epoch": 0.538,
3772
+ "grad_norm": 0.015258733183145523,
3773
+ "learning_rate": 2.3100000000000002e-05,
3774
+ "loss": 0.0015,
3775
+ "step": 5380
3776
+ },
3777
+ {
3778
+ "epoch": 0.539,
3779
+ "grad_norm": 0.010932616889476776,
3780
+ "learning_rate": 2.305e-05,
3781
+ "loss": 0.0014,
3782
+ "step": 5390
3783
+ },
3784
+ {
3785
+ "epoch": 0.54,
3786
+ "grad_norm": 0.0102313794195652,
3787
+ "learning_rate": 2.3000000000000003e-05,
3788
+ "loss": 0.0014,
3789
+ "step": 5400
3790
+ },
3791
+ {
3792
+ "epoch": 0.541,
3793
+ "grad_norm": 0.00674120569601655,
3794
+ "learning_rate": 2.2950000000000002e-05,
3795
+ "loss": 0.0014,
3796
+ "step": 5410
3797
+ },
3798
+ {
3799
+ "epoch": 0.542,
3800
+ "grad_norm": 0.015179513022303581,
3801
+ "learning_rate": 2.29e-05,
3802
+ "loss": 0.0014,
3803
+ "step": 5420
3804
+ },
3805
+ {
3806
+ "epoch": 0.543,
3807
+ "grad_norm": 0.03448422998189926,
3808
+ "learning_rate": 2.2850000000000003e-05,
3809
+ "loss": 0.0019,
3810
+ "step": 5430
3811
+ },
3812
+ {
3813
+ "epoch": 0.544,
3814
+ "grad_norm": 0.028603358194231987,
3815
+ "learning_rate": 2.2800000000000002e-05,
3816
+ "loss": 0.0019,
3817
+ "step": 5440
3818
+ },
3819
+ {
3820
+ "epoch": 0.545,
3821
+ "grad_norm": 0.014372209087014198,
3822
+ "learning_rate": 2.275e-05,
3823
+ "loss": 0.0016,
3824
+ "step": 5450
3825
+ },
3826
+ {
3827
+ "epoch": 0.546,
3828
+ "grad_norm": 0.031532082706689835,
3829
+ "learning_rate": 2.2700000000000003e-05,
3830
+ "loss": 0.0017,
3831
+ "step": 5460
3832
+ },
3833
+ {
3834
+ "epoch": 0.547,
3835
+ "grad_norm": 0.018091056495904922,
3836
+ "learning_rate": 2.265e-05,
3837
+ "loss": 0.0016,
3838
+ "step": 5470
3839
+ },
3840
+ {
3841
+ "epoch": 0.548,
3842
+ "grad_norm": 0.014843069948256016,
3843
+ "learning_rate": 2.26e-05,
3844
+ "loss": 0.0015,
3845
+ "step": 5480
3846
+ },
3847
+ {
3848
+ "epoch": 0.549,
3849
+ "grad_norm": 0.011632148176431656,
3850
+ "learning_rate": 2.2550000000000003e-05,
3851
+ "loss": 0.0014,
3852
+ "step": 5490
3853
+ },
3854
+ {
3855
+ "epoch": 0.55,
3856
+ "grad_norm": 0.009511668235063553,
3857
+ "learning_rate": 2.25e-05,
3858
+ "loss": 0.0014,
3859
+ "step": 5500
3860
+ },
3861
+ {
3862
+ "epoch": 0.551,
3863
+ "grad_norm": 0.007981637492775917,
3864
+ "learning_rate": 2.245e-05,
3865
+ "loss": 0.0014,
3866
+ "step": 5510
3867
+ },
3868
+ {
3869
+ "epoch": 0.552,
3870
+ "grad_norm": 0.021288806572556496,
3871
+ "learning_rate": 2.2400000000000002e-05,
3872
+ "loss": 0.0015,
3873
+ "step": 5520
3874
+ },
3875
+ {
3876
+ "epoch": 0.553,
3877
+ "grad_norm": 0.01468642894178629,
3878
+ "learning_rate": 2.235e-05,
3879
+ "loss": 0.0018,
3880
+ "step": 5530
3881
+ },
3882
+ {
3883
+ "epoch": 0.554,
3884
+ "grad_norm": 0.011532713659107685,
3885
+ "learning_rate": 2.23e-05,
3886
+ "loss": 0.0012,
3887
+ "step": 5540
3888
+ },
3889
+ {
3890
+ "epoch": 0.555,
3891
+ "grad_norm": 0.00889046210795641,
3892
+ "learning_rate": 2.2250000000000002e-05,
3893
+ "loss": 0.0011,
3894
+ "step": 5550
3895
+ },
3896
+ {
3897
+ "epoch": 0.556,
3898
+ "grad_norm": 0.01401284895837307,
3899
+ "learning_rate": 2.22e-05,
3900
+ "loss": 0.0014,
3901
+ "step": 5560
3902
+ },
3903
+ {
3904
+ "epoch": 0.557,
3905
+ "grad_norm": 0.012369327247142792,
3906
+ "learning_rate": 2.215e-05,
3907
+ "loss": 0.0015,
3908
+ "step": 5570
3909
+ },
3910
+ {
3911
+ "epoch": 0.558,
3912
+ "grad_norm": 0.015258446335792542,
3913
+ "learning_rate": 2.2100000000000002e-05,
3914
+ "loss": 0.0015,
3915
+ "step": 5580
3916
+ },
3917
+ {
3918
+ "epoch": 0.559,
3919
+ "grad_norm": 0.009015046060085297,
3920
+ "learning_rate": 2.205e-05,
3921
+ "loss": 0.0012,
3922
+ "step": 5590
3923
+ },
3924
+ {
3925
+ "epoch": 0.56,
3926
+ "grad_norm": 0.011163819581270218,
3927
+ "learning_rate": 2.2000000000000003e-05,
3928
+ "loss": 0.0012,
3929
+ "step": 5600
3930
+ },
3931
+ {
3932
+ "epoch": 0.561,
3933
+ "grad_norm": 0.016389524564146996,
3934
+ "learning_rate": 2.195e-05,
3935
+ "loss": 0.0016,
3936
+ "step": 5610
3937
+ },
3938
+ {
3939
+ "epoch": 0.562,
3940
+ "grad_norm": 0.01325678639113903,
3941
+ "learning_rate": 2.19e-05,
3942
+ "loss": 0.0013,
3943
+ "step": 5620
3944
+ },
3945
+ {
3946
+ "epoch": 0.563,
3947
+ "grad_norm": 0.017966121435165405,
3948
+ "learning_rate": 2.1850000000000003e-05,
3949
+ "loss": 0.0014,
3950
+ "step": 5630
3951
+ },
3952
+ {
3953
+ "epoch": 0.564,
3954
+ "grad_norm": 0.012039076536893845,
3955
+ "learning_rate": 2.18e-05,
3956
+ "loss": 0.0013,
3957
+ "step": 5640
3958
+ },
3959
+ {
3960
+ "epoch": 0.565,
3961
+ "grad_norm": 0.006665175314992666,
3962
+ "learning_rate": 2.175e-05,
3963
+ "loss": 0.0012,
3964
+ "step": 5650
3965
+ },
3966
+ {
3967
+ "epoch": 0.566,
3968
+ "grad_norm": 0.0105441864579916,
3969
+ "learning_rate": 2.1700000000000002e-05,
3970
+ "loss": 0.0014,
3971
+ "step": 5660
3972
+ },
3973
+ {
3974
+ "epoch": 0.567,
3975
+ "grad_norm": 0.007554101757705212,
3976
+ "learning_rate": 2.165e-05,
3977
+ "loss": 0.0011,
3978
+ "step": 5670
3979
+ },
3980
+ {
3981
+ "epoch": 0.568,
3982
+ "grad_norm": 0.009823901578783989,
3983
+ "learning_rate": 2.16e-05,
3984
+ "loss": 0.0013,
3985
+ "step": 5680
3986
+ },
3987
+ {
3988
+ "epoch": 0.569,
3989
+ "grad_norm": 0.01720455475151539,
3990
+ "learning_rate": 2.1550000000000002e-05,
3991
+ "loss": 0.0015,
3992
+ "step": 5690
3993
+ },
3994
+ {
3995
+ "epoch": 0.57,
3996
+ "grad_norm": 0.01107338909059763,
3997
+ "learning_rate": 2.15e-05,
3998
+ "loss": 0.0012,
3999
+ "step": 5700
4000
+ },
4001
+ {
4002
+ "epoch": 0.571,
4003
+ "grad_norm": 0.01756761223077774,
4004
+ "learning_rate": 2.145e-05,
4005
+ "loss": 0.0014,
4006
+ "step": 5710
4007
+ },
4008
+ {
4009
+ "epoch": 0.572,
4010
+ "grad_norm": 0.022118983790278435,
4011
+ "learning_rate": 2.1400000000000002e-05,
4012
+ "loss": 0.0015,
4013
+ "step": 5720
4014
+ },
4015
+ {
4016
+ "epoch": 0.573,
4017
+ "grad_norm": 0.01616830937564373,
4018
+ "learning_rate": 2.135e-05,
4019
+ "loss": 0.0014,
4020
+ "step": 5730
4021
+ },
4022
+ {
4023
+ "epoch": 0.574,
4024
+ "grad_norm": 0.020481310784816742,
4025
+ "learning_rate": 2.13e-05,
4026
+ "loss": 0.0023,
4027
+ "step": 5740
4028
+ },
4029
+ {
4030
+ "epoch": 0.575,
4031
+ "grad_norm": 0.018176857382059097,
4032
+ "learning_rate": 2.125e-05,
4033
+ "loss": 0.0015,
4034
+ "step": 5750
4035
+ },
4036
+ {
4037
+ "epoch": 0.576,
4038
+ "grad_norm": 0.011317101307213306,
4039
+ "learning_rate": 2.12e-05,
4040
+ "loss": 0.0012,
4041
+ "step": 5760
4042
+ },
4043
+ {
4044
+ "epoch": 0.577,
4045
+ "grad_norm": 0.028791502118110657,
4046
+ "learning_rate": 2.115e-05,
4047
+ "loss": 0.0014,
4048
+ "step": 5770
4049
+ },
4050
+ {
4051
+ "epoch": 0.578,
4052
+ "grad_norm": 0.013037024065852165,
4053
+ "learning_rate": 2.11e-05,
4054
+ "loss": 0.0013,
4055
+ "step": 5780
4056
+ },
4057
+ {
4058
+ "epoch": 0.579,
4059
+ "grad_norm": 0.021426070481538773,
4060
+ "learning_rate": 2.105e-05,
4061
+ "loss": 0.0015,
4062
+ "step": 5790
4063
+ },
4064
+ {
4065
+ "epoch": 0.58,
4066
+ "grad_norm": 0.012033521197736263,
4067
+ "learning_rate": 2.1e-05,
4068
+ "loss": 0.0011,
4069
+ "step": 5800
4070
+ },
4071
+ {
4072
+ "epoch": 0.581,
4073
+ "grad_norm": 0.014337443746626377,
4074
+ "learning_rate": 2.095e-05,
4075
+ "loss": 0.0012,
4076
+ "step": 5810
4077
+ },
4078
+ {
4079
+ "epoch": 0.582,
4080
+ "grad_norm": 0.008603113703429699,
4081
+ "learning_rate": 2.09e-05,
4082
+ "loss": 0.0011,
4083
+ "step": 5820
4084
+ },
4085
+ {
4086
+ "epoch": 0.583,
4087
+ "grad_norm": 0.025418557226657867,
4088
+ "learning_rate": 2.085e-05,
4089
+ "loss": 0.0014,
4090
+ "step": 5830
4091
+ },
4092
+ {
4093
+ "epoch": 0.584,
4094
+ "grad_norm": 0.008621426299214363,
4095
+ "learning_rate": 2.08e-05,
4096
+ "loss": 0.0011,
4097
+ "step": 5840
4098
+ },
4099
+ {
4100
+ "epoch": 0.585,
4101
+ "grad_norm": 0.009969389997422695,
4102
+ "learning_rate": 2.075e-05,
4103
+ "loss": 0.0015,
4104
+ "step": 5850
4105
+ },
4106
+ {
4107
+ "epoch": 0.586,
4108
+ "grad_norm": 0.00997992418706417,
4109
+ "learning_rate": 2.07e-05,
4110
+ "loss": 0.0011,
4111
+ "step": 5860
4112
+ },
4113
+ {
4114
+ "epoch": 0.587,
4115
+ "grad_norm": 0.019949181005358696,
4116
+ "learning_rate": 2.065e-05,
4117
+ "loss": 0.001,
4118
+ "step": 5870
4119
+ },
4120
+ {
4121
+ "epoch": 0.588,
4122
+ "grad_norm": 0.009619793854653835,
4123
+ "learning_rate": 2.06e-05,
4124
+ "loss": 0.0011,
4125
+ "step": 5880
4126
+ },
4127
+ {
4128
+ "epoch": 0.589,
4129
+ "grad_norm": 0.007747489493340254,
4130
+ "learning_rate": 2.055e-05,
4131
+ "loss": 0.0012,
4132
+ "step": 5890
4133
+ },
4134
+ {
4135
+ "epoch": 0.59,
4136
+ "grad_norm": 0.01052554789930582,
4137
+ "learning_rate": 2.05e-05,
4138
+ "loss": 0.0014,
4139
+ "step": 5900
4140
+ },
4141
+ {
4142
+ "epoch": 0.591,
4143
+ "grad_norm": 0.014904200099408627,
4144
+ "learning_rate": 2.045e-05,
4145
+ "loss": 0.0012,
4146
+ "step": 5910
4147
+ },
4148
+ {
4149
+ "epoch": 0.592,
4150
+ "grad_norm": 0.00679561635479331,
4151
+ "learning_rate": 2.04e-05,
4152
+ "loss": 0.0011,
4153
+ "step": 5920
4154
+ },
4155
+ {
4156
+ "epoch": 0.593,
4157
+ "grad_norm": 0.006072670221328735,
4158
+ "learning_rate": 2.035e-05,
4159
+ "loss": 0.0011,
4160
+ "step": 5930
4161
+ },
4162
+ {
4163
+ "epoch": 0.594,
4164
+ "grad_norm": 0.014733157120645046,
4165
+ "learning_rate": 2.0300000000000002e-05,
4166
+ "loss": 0.0011,
4167
+ "step": 5940
4168
+ },
4169
+ {
4170
+ "epoch": 0.595,
4171
+ "grad_norm": 0.015511419624090195,
4172
+ "learning_rate": 2.025e-05,
4173
+ "loss": 0.0016,
4174
+ "step": 5950
4175
+ },
4176
+ {
4177
+ "epoch": 0.596,
4178
+ "grad_norm": 0.010620438493788242,
4179
+ "learning_rate": 2.0200000000000003e-05,
4180
+ "loss": 0.0012,
4181
+ "step": 5960
4182
+ },
4183
+ {
4184
+ "epoch": 0.597,
4185
+ "grad_norm": 0.0075794099830091,
4186
+ "learning_rate": 2.0150000000000002e-05,
4187
+ "loss": 0.0011,
4188
+ "step": 5970
4189
+ },
4190
+ {
4191
+ "epoch": 0.598,
4192
+ "grad_norm": 0.007882976904511452,
4193
+ "learning_rate": 2.01e-05,
4194
+ "loss": 0.0011,
4195
+ "step": 5980
4196
+ },
4197
+ {
4198
+ "epoch": 0.599,
4199
+ "grad_norm": 0.011548763141036034,
4200
+ "learning_rate": 2.0050000000000003e-05,
4201
+ "loss": 0.0013,
4202
+ "step": 5990
4203
+ },
4204
+ {
4205
+ "epoch": 0.6,
4206
+ "grad_norm": 0.0084703853353858,
4207
+ "learning_rate": 2e-05,
4208
+ "loss": 0.0011,
4209
+ "step": 6000
4210
+ },
4211
+ {
4212
+ "epoch": 0.601,
4213
+ "grad_norm": 0.007603704463690519,
4214
+ "learning_rate": 1.995e-05,
4215
+ "loss": 0.001,
4216
+ "step": 6010
4217
+ },
4218
+ {
4219
+ "epoch": 0.602,
4220
+ "grad_norm": 0.008562711998820305,
4221
+ "learning_rate": 1.9900000000000003e-05,
4222
+ "loss": 0.0012,
4223
+ "step": 6020
4224
+ },
4225
+ {
4226
+ "epoch": 0.603,
4227
+ "grad_norm": 0.007590813562273979,
4228
+ "learning_rate": 1.985e-05,
4229
+ "loss": 0.001,
4230
+ "step": 6030
4231
+ },
4232
+ {
4233
+ "epoch": 0.604,
4234
+ "grad_norm": 0.020342741161584854,
4235
+ "learning_rate": 1.9800000000000004e-05,
4236
+ "loss": 0.0017,
4237
+ "step": 6040
4238
+ },
4239
+ {
4240
+ "epoch": 0.605,
4241
+ "grad_norm": 0.16912633180618286,
4242
+ "learning_rate": 1.9750000000000002e-05,
4243
+ "loss": 0.0089,
4244
+ "step": 6050
4245
+ },
4246
+ {
4247
+ "epoch": 0.606,
4248
+ "grad_norm": 0.08793429285287857,
4249
+ "learning_rate": 1.97e-05,
4250
+ "loss": 0.0027,
4251
+ "step": 6060
4252
+ },
4253
+ {
4254
+ "epoch": 0.607,
4255
+ "grad_norm": 0.05196760594844818,
4256
+ "learning_rate": 1.9650000000000003e-05,
4257
+ "loss": 0.0022,
4258
+ "step": 6070
4259
+ },
4260
+ {
4261
+ "epoch": 0.608,
4262
+ "grad_norm": 0.02118327096104622,
4263
+ "learning_rate": 1.9600000000000002e-05,
4264
+ "loss": 0.0021,
4265
+ "step": 6080
4266
+ },
4267
+ {
4268
+ "epoch": 0.609,
4269
+ "grad_norm": 0.013289586640894413,
4270
+ "learning_rate": 1.955e-05,
4271
+ "loss": 0.0013,
4272
+ "step": 6090
4273
+ },
4274
+ {
4275
+ "epoch": 0.61,
4276
+ "grad_norm": 0.012911707162857056,
4277
+ "learning_rate": 1.9500000000000003e-05,
4278
+ "loss": 0.0013,
4279
+ "step": 6100
4280
+ },
4281
+ {
4282
+ "epoch": 0.611,
4283
+ "grad_norm": 0.018663186579942703,
4284
+ "learning_rate": 1.9450000000000002e-05,
4285
+ "loss": 0.0012,
4286
+ "step": 6110
4287
+ },
4288
+ {
4289
+ "epoch": 0.612,
4290
+ "grad_norm": 0.010551884770393372,
4291
+ "learning_rate": 1.94e-05,
4292
+ "loss": 0.0012,
4293
+ "step": 6120
4294
+ },
4295
+ {
4296
+ "epoch": 0.613,
4297
+ "grad_norm": 0.015853077173233032,
4298
+ "learning_rate": 1.9350000000000003e-05,
4299
+ "loss": 0.0013,
4300
+ "step": 6130
4301
+ },
4302
+ {
4303
+ "epoch": 0.614,
4304
+ "grad_norm": 0.020374910905957222,
4305
+ "learning_rate": 1.93e-05,
4306
+ "loss": 0.001,
4307
+ "step": 6140
4308
+ },
4309
+ {
4310
+ "epoch": 0.615,
4311
+ "grad_norm": 0.015159848146140575,
4312
+ "learning_rate": 1.925e-05,
4313
+ "loss": 0.0013,
4314
+ "step": 6150
4315
+ },
4316
+ {
4317
+ "epoch": 0.616,
4318
+ "grad_norm": 0.007991676218807697,
4319
+ "learning_rate": 1.9200000000000003e-05,
4320
+ "loss": 0.0013,
4321
+ "step": 6160
4322
+ },
4323
+ {
4324
+ "epoch": 0.617,
4325
+ "grad_norm": 0.007849587127566338,
4326
+ "learning_rate": 1.915e-05,
4327
+ "loss": 0.0011,
4328
+ "step": 6170
4329
+ },
4330
+ {
4331
+ "epoch": 0.618,
4332
+ "grad_norm": 0.022048622369766235,
4333
+ "learning_rate": 1.91e-05,
4334
+ "loss": 0.001,
4335
+ "step": 6180
4336
+ },
4337
+ {
4338
+ "epoch": 0.619,
4339
+ "grad_norm": 0.021215343847870827,
4340
+ "learning_rate": 1.9050000000000002e-05,
4341
+ "loss": 0.0011,
4342
+ "step": 6190
4343
+ },
4344
+ {
4345
+ "epoch": 0.62,
4346
+ "grad_norm": 0.012288344092667103,
4347
+ "learning_rate": 1.9e-05,
4348
+ "loss": 0.0012,
4349
+ "step": 6200
4350
+ },
4351
+ {
4352
+ "epoch": 0.621,
4353
+ "grad_norm": 0.020313331857323647,
4354
+ "learning_rate": 1.895e-05,
4355
+ "loss": 0.0011,
4356
+ "step": 6210
4357
+ },
4358
+ {
4359
+ "epoch": 0.622,
4360
+ "grad_norm": 0.008762447163462639,
4361
+ "learning_rate": 1.8900000000000002e-05,
4362
+ "loss": 0.001,
4363
+ "step": 6220
4364
+ },
4365
+ {
4366
+ "epoch": 0.623,
4367
+ "grad_norm": 0.0247616209089756,
4368
+ "learning_rate": 1.885e-05,
4369
+ "loss": 0.0011,
4370
+ "step": 6230
4371
+ },
4372
+ {
4373
+ "epoch": 0.624,
4374
+ "grad_norm": 0.09021363407373428,
4375
+ "learning_rate": 1.88e-05,
4376
+ "loss": 0.0016,
4377
+ "step": 6240
4378
+ },
4379
+ {
4380
+ "epoch": 0.625,
4381
+ "grad_norm": 0.017945896834135056,
4382
+ "learning_rate": 1.8750000000000002e-05,
4383
+ "loss": 0.0011,
4384
+ "step": 6250
4385
+ },
4386
+ {
4387
+ "epoch": 0.626,
4388
+ "grad_norm": 0.011303462088108063,
4389
+ "learning_rate": 1.87e-05,
4390
+ "loss": 0.0011,
4391
+ "step": 6260
4392
+ },
4393
+ {
4394
+ "epoch": 0.627,
4395
+ "grad_norm": 0.008381664752960205,
4396
+ "learning_rate": 1.865e-05,
4397
+ "loss": 0.0011,
4398
+ "step": 6270
4399
+ },
4400
+ {
4401
+ "epoch": 0.628,
4402
+ "grad_norm": 0.011003987863659859,
4403
+ "learning_rate": 1.86e-05,
4404
+ "loss": 0.0012,
4405
+ "step": 6280
4406
+ },
4407
+ {
4408
+ "epoch": 0.629,
4409
+ "grad_norm": 0.015965888276696205,
4410
+ "learning_rate": 1.855e-05,
4411
+ "loss": 0.001,
4412
+ "step": 6290
4413
+ },
4414
+ {
4415
+ "epoch": 0.63,
4416
+ "grad_norm": 0.006507181562483311,
4417
+ "learning_rate": 1.85e-05,
4418
+ "loss": 0.0009,
4419
+ "step": 6300
4420
+ },
4421
+ {
4422
+ "epoch": 0.631,
4423
+ "grad_norm": 0.015577591024339199,
4424
+ "learning_rate": 1.845e-05,
4425
+ "loss": 0.001,
4426
+ "step": 6310
4427
+ },
4428
+ {
4429
+ "epoch": 0.632,
4430
+ "grad_norm": 0.006741558667272329,
4431
+ "learning_rate": 1.84e-05,
4432
+ "loss": 0.0011,
4433
+ "step": 6320
4434
+ },
4435
+ {
4436
+ "epoch": 0.633,
4437
+ "grad_norm": 0.016030525788664818,
4438
+ "learning_rate": 1.8350000000000002e-05,
4439
+ "loss": 0.001,
4440
+ "step": 6330
4441
+ },
4442
+ {
4443
+ "epoch": 0.634,
4444
+ "grad_norm": 0.010763168334960938,
4445
+ "learning_rate": 1.83e-05,
4446
+ "loss": 0.0011,
4447
+ "step": 6340
4448
+ },
4449
+ {
4450
+ "epoch": 0.635,
4451
+ "grad_norm": 0.017273874953389168,
4452
+ "learning_rate": 1.825e-05,
4453
+ "loss": 0.001,
4454
+ "step": 6350
4455
+ },
4456
+ {
4457
+ "epoch": 0.636,
4458
+ "grad_norm": 0.010964670218527317,
4459
+ "learning_rate": 1.8200000000000002e-05,
4460
+ "loss": 0.0011,
4461
+ "step": 6360
4462
+ },
4463
+ {
4464
+ "epoch": 0.637,
4465
+ "grad_norm": 0.00803497713059187,
4466
+ "learning_rate": 1.815e-05,
4467
+ "loss": 0.0009,
4468
+ "step": 6370
4469
+ },
4470
+ {
4471
+ "epoch": 0.638,
4472
+ "grad_norm": 0.007479315157979727,
4473
+ "learning_rate": 1.81e-05,
4474
+ "loss": 0.0014,
4475
+ "step": 6380
4476
+ },
4477
+ {
4478
+ "epoch": 0.639,
4479
+ "grad_norm": 0.010598058812320232,
4480
+ "learning_rate": 1.805e-05,
4481
+ "loss": 0.001,
4482
+ "step": 6390
4483
+ },
4484
+ {
4485
+ "epoch": 0.64,
4486
+ "grad_norm": 0.009770036675035954,
4487
+ "learning_rate": 1.8e-05,
4488
+ "loss": 0.0009,
4489
+ "step": 6400
4490
+ },
4491
+ {
4492
+ "epoch": 0.641,
4493
+ "grad_norm": 0.011602561920881271,
4494
+ "learning_rate": 1.795e-05,
4495
+ "loss": 0.0008,
4496
+ "step": 6410
4497
+ },
4498
+ {
4499
+ "epoch": 0.642,
4500
+ "grad_norm": 0.0076597342267632484,
4501
+ "learning_rate": 1.79e-05,
4502
+ "loss": 0.0008,
4503
+ "step": 6420
4504
+ },
4505
+ {
4506
+ "epoch": 0.643,
4507
+ "grad_norm": 0.012248953804373741,
4508
+ "learning_rate": 1.785e-05,
4509
+ "loss": 0.0008,
4510
+ "step": 6430
4511
+ },
4512
+ {
4513
+ "epoch": 0.644,
4514
+ "grad_norm": 0.005626557394862175,
4515
+ "learning_rate": 1.78e-05,
4516
+ "loss": 0.0008,
4517
+ "step": 6440
4518
+ },
4519
+ {
4520
+ "epoch": 0.645,
4521
+ "grad_norm": 0.005482000298798084,
4522
+ "learning_rate": 1.775e-05,
4523
+ "loss": 0.0008,
4524
+ "step": 6450
4525
+ },
4526
+ {
4527
+ "epoch": 0.646,
4528
+ "grad_norm": 0.007456011138856411,
4529
+ "learning_rate": 1.77e-05,
4530
+ "loss": 0.0008,
4531
+ "step": 6460
4532
+ },
4533
+ {
4534
+ "epoch": 0.647,
4535
+ "grad_norm": 0.008909308351576328,
4536
+ "learning_rate": 1.765e-05,
4537
+ "loss": 0.0008,
4538
+ "step": 6470
4539
+ },
4540
+ {
4541
+ "epoch": 0.648,
4542
+ "grad_norm": 0.011135280132293701,
4543
+ "learning_rate": 1.76e-05,
4544
+ "loss": 0.0009,
4545
+ "step": 6480
4546
+ },
4547
+ {
4548
+ "epoch": 0.649,
4549
+ "grad_norm": 0.01595783233642578,
4550
+ "learning_rate": 1.755e-05,
4551
+ "loss": 0.001,
4552
+ "step": 6490
4553
+ },
4554
+ {
4555
+ "epoch": 0.65,
4556
+ "grad_norm": 0.013902807608246803,
4557
+ "learning_rate": 1.75e-05,
4558
+ "loss": 0.0011,
4559
+ "step": 6500
4560
+ },
4561
+ {
4562
+ "epoch": 0.651,
4563
+ "grad_norm": 0.010244622826576233,
4564
+ "learning_rate": 1.745e-05,
4565
+ "loss": 0.0009,
4566
+ "step": 6510
4567
+ },
4568
+ {
4569
+ "epoch": 0.652,
4570
+ "grad_norm": 0.007476091384887695,
4571
+ "learning_rate": 1.74e-05,
4572
+ "loss": 0.0009,
4573
+ "step": 6520
4574
+ },
4575
+ {
4576
+ "epoch": 0.653,
4577
+ "grad_norm": 0.013044660910964012,
4578
+ "learning_rate": 1.7349999999999998e-05,
4579
+ "loss": 0.0009,
4580
+ "step": 6530
4581
+ },
4582
+ {
4583
+ "epoch": 0.654,
4584
+ "grad_norm": 0.004804369527846575,
4585
+ "learning_rate": 1.73e-05,
4586
+ "loss": 0.0009,
4587
+ "step": 6540
4588
+ },
4589
+ {
4590
+ "epoch": 0.655,
4591
+ "grad_norm": 0.006042002234607935,
4592
+ "learning_rate": 1.725e-05,
4593
+ "loss": 0.0008,
4594
+ "step": 6550
4595
+ },
4596
+ {
4597
+ "epoch": 0.656,
4598
+ "grad_norm": 0.010785943828523159,
4599
+ "learning_rate": 1.7199999999999998e-05,
4600
+ "loss": 0.0009,
4601
+ "step": 6560
4602
+ },
4603
+ {
4604
+ "epoch": 0.657,
4605
+ "grad_norm": 0.011350172571837902,
4606
+ "learning_rate": 1.7150000000000004e-05,
4607
+ "loss": 0.0008,
4608
+ "step": 6570
4609
+ },
4610
+ {
4611
+ "epoch": 0.658,
4612
+ "grad_norm": 0.007638021372258663,
4613
+ "learning_rate": 1.7100000000000002e-05,
4614
+ "loss": 0.0009,
4615
+ "step": 6580
4616
+ },
4617
+ {
4618
+ "epoch": 0.659,
4619
+ "grad_norm": 0.005735939834266901,
4620
+ "learning_rate": 1.705e-05,
4621
+ "loss": 0.0009,
4622
+ "step": 6590
4623
+ },
4624
+ {
4625
+ "epoch": 0.66,
4626
+ "grad_norm": 0.02717960625886917,
4627
+ "learning_rate": 1.7000000000000003e-05,
4628
+ "loss": 0.0011,
4629
+ "step": 6600
4630
+ },
4631
+ {
4632
+ "epoch": 0.661,
4633
+ "grad_norm": 0.006012643221765757,
4634
+ "learning_rate": 1.6950000000000002e-05,
4635
+ "loss": 0.0008,
4636
+ "step": 6610
4637
+ },
4638
+ {
4639
+ "epoch": 0.662,
4640
+ "grad_norm": 0.00599683728069067,
4641
+ "learning_rate": 1.69e-05,
4642
+ "loss": 0.0008,
4643
+ "step": 6620
4644
+ },
4645
+ {
4646
+ "epoch": 0.663,
4647
+ "grad_norm": 0.026952974498271942,
4648
+ "learning_rate": 1.6850000000000003e-05,
4649
+ "loss": 0.0008,
4650
+ "step": 6630
4651
+ },
4652
+ {
4653
+ "epoch": 0.664,
4654
+ "grad_norm": 0.008171536959707737,
4655
+ "learning_rate": 1.6800000000000002e-05,
4656
+ "loss": 0.0008,
4657
+ "step": 6640
4658
+ },
4659
+ {
4660
+ "epoch": 0.665,
4661
+ "grad_norm": 0.007446442265063524,
4662
+ "learning_rate": 1.675e-05,
4663
+ "loss": 0.0009,
4664
+ "step": 6650
4665
+ },
4666
+ {
4667
+ "epoch": 0.666,
4668
+ "grad_norm": 0.006456063129007816,
4669
+ "learning_rate": 1.6700000000000003e-05,
4670
+ "loss": 0.0008,
4671
+ "step": 6660
4672
+ },
4673
+ {
4674
+ "epoch": 0.667,
4675
+ "grad_norm": 0.008162173442542553,
4676
+ "learning_rate": 1.665e-05,
4677
+ "loss": 0.0007,
4678
+ "step": 6670
4679
+ },
4680
+ {
4681
+ "epoch": 0.668,
4682
+ "grad_norm": 0.004432919900864363,
4683
+ "learning_rate": 1.66e-05,
4684
+ "loss": 0.0008,
4685
+ "step": 6680
4686
+ },
4687
+ {
4688
+ "epoch": 0.669,
4689
+ "grad_norm": 0.007158307824283838,
4690
+ "learning_rate": 1.6550000000000002e-05,
4691
+ "loss": 0.0008,
4692
+ "step": 6690
4693
+ },
4694
+ {
4695
+ "epoch": 0.67,
4696
+ "grad_norm": 0.003983801696449518,
4697
+ "learning_rate": 1.65e-05,
4698
+ "loss": 0.0007,
4699
+ "step": 6700
4700
+ },
4701
+ {
4702
+ "epoch": 0.671,
4703
+ "grad_norm": 0.005170087795704603,
4704
+ "learning_rate": 1.645e-05,
4705
+ "loss": 0.0008,
4706
+ "step": 6710
4707
+ },
4708
+ {
4709
+ "epoch": 0.672,
4710
+ "grad_norm": 0.004729804117232561,
4711
+ "learning_rate": 1.6400000000000002e-05,
4712
+ "loss": 0.0008,
4713
+ "step": 6720
4714
+ },
4715
+ {
4716
+ "epoch": 0.673,
4717
+ "grad_norm": 0.010037174448370934,
4718
+ "learning_rate": 1.635e-05,
4719
+ "loss": 0.001,
4720
+ "step": 6730
4721
+ },
4722
+ {
4723
+ "epoch": 0.674,
4724
+ "grad_norm": 0.050949569791555405,
4725
+ "learning_rate": 1.63e-05,
4726
+ "loss": 0.0023,
4727
+ "step": 6740
4728
+ },
4729
+ {
4730
+ "epoch": 0.675,
4731
+ "grad_norm": 0.0323474146425724,
4732
+ "learning_rate": 1.6250000000000002e-05,
4733
+ "loss": 0.0017,
4734
+ "step": 6750
4735
+ },
4736
+ {
4737
+ "epoch": 0.676,
4738
+ "grad_norm": 0.027231359854340553,
4739
+ "learning_rate": 1.62e-05,
4740
+ "loss": 0.0021,
4741
+ "step": 6760
4742
+ },
4743
+ {
4744
+ "epoch": 0.677,
4745
+ "grad_norm": 0.01555855292826891,
4746
+ "learning_rate": 1.6150000000000003e-05,
4747
+ "loss": 0.0013,
4748
+ "step": 6770
4749
+ },
4750
+ {
4751
+ "epoch": 0.678,
4752
+ "grad_norm": 0.01804298162460327,
4753
+ "learning_rate": 1.6100000000000002e-05,
4754
+ "loss": 0.0011,
4755
+ "step": 6780
4756
+ },
4757
+ {
4758
+ "epoch": 0.679,
4759
+ "grad_norm": 0.011248771101236343,
4760
+ "learning_rate": 1.605e-05,
4761
+ "loss": 0.0011,
4762
+ "step": 6790
4763
+ },
4764
+ {
4765
+ "epoch": 0.68,
4766
+ "grad_norm": 0.007389044389128685,
4767
+ "learning_rate": 1.6000000000000003e-05,
4768
+ "loss": 0.0009,
4769
+ "step": 6800
4770
+ },
4771
+ {
4772
+ "epoch": 0.681,
4773
+ "grad_norm": 0.014606145210564137,
4774
+ "learning_rate": 1.595e-05,
4775
+ "loss": 0.0012,
4776
+ "step": 6810
4777
+ },
4778
+ {
4779
+ "epoch": 0.682,
4780
+ "grad_norm": 0.012476052157580853,
4781
+ "learning_rate": 1.59e-05,
4782
+ "loss": 0.0009,
4783
+ "step": 6820
4784
+ },
4785
+ {
4786
+ "epoch": 0.683,
4787
+ "grad_norm": 0.009272475726902485,
4788
+ "learning_rate": 1.5850000000000002e-05,
4789
+ "loss": 0.0009,
4790
+ "step": 6830
4791
+ },
4792
+ {
4793
+ "epoch": 0.684,
4794
+ "grad_norm": 0.011705187149345875,
4795
+ "learning_rate": 1.58e-05,
4796
+ "loss": 0.0009,
4797
+ "step": 6840
4798
+ },
4799
+ {
4800
+ "epoch": 0.685,
4801
+ "grad_norm": 0.01874556578695774,
4802
+ "learning_rate": 1.575e-05,
4803
+ "loss": 0.0011,
4804
+ "step": 6850
4805
+ },
4806
+ {
4807
+ "epoch": 0.686,
4808
+ "grad_norm": 0.01463324110955,
4809
+ "learning_rate": 1.5700000000000002e-05,
4810
+ "loss": 0.0009,
4811
+ "step": 6860
4812
+ },
4813
+ {
4814
+ "epoch": 0.687,
4815
+ "grad_norm": 0.012001392431557178,
4816
+ "learning_rate": 1.565e-05,
4817
+ "loss": 0.001,
4818
+ "step": 6870
4819
+ },
4820
+ {
4821
+ "epoch": 0.688,
4822
+ "grad_norm": 0.009366356767714024,
4823
+ "learning_rate": 1.56e-05,
4824
+ "loss": 0.0008,
4825
+ "step": 6880
4826
+ },
4827
+ {
4828
+ "epoch": 0.689,
4829
+ "grad_norm": 0.010064000263810158,
4830
+ "learning_rate": 1.5550000000000002e-05,
4831
+ "loss": 0.0009,
4832
+ "step": 6890
4833
+ },
4834
+ {
4835
+ "epoch": 0.69,
4836
+ "grad_norm": 0.016703909263014793,
4837
+ "learning_rate": 1.55e-05,
4838
+ "loss": 0.0009,
4839
+ "step": 6900
4840
+ },
4841
+ {
4842
+ "epoch": 0.691,
4843
+ "grad_norm": 0.0146669652312994,
4844
+ "learning_rate": 1.545e-05,
4845
+ "loss": 0.001,
4846
+ "step": 6910
4847
+ },
4848
+ {
4849
+ "epoch": 0.692,
4850
+ "grad_norm": 0.006643705535680056,
4851
+ "learning_rate": 1.54e-05,
4852
+ "loss": 0.0009,
4853
+ "step": 6920
4854
+ },
4855
+ {
4856
+ "epoch": 0.693,
4857
+ "grad_norm": 0.011501871049404144,
4858
+ "learning_rate": 1.535e-05,
4859
+ "loss": 0.0008,
4860
+ "step": 6930
4861
+ },
4862
+ {
4863
+ "epoch": 0.694,
4864
+ "grad_norm": 0.008170065470039845,
4865
+ "learning_rate": 1.53e-05,
4866
+ "loss": 0.0008,
4867
+ "step": 6940
4868
+ },
4869
+ {
4870
+ "epoch": 0.695,
4871
+ "grad_norm": 0.00737554719671607,
4872
+ "learning_rate": 1.525e-05,
4873
+ "loss": 0.0007,
4874
+ "step": 6950
4875
+ },
4876
+ {
4877
+ "epoch": 0.696,
4878
+ "grad_norm": 0.006846282631158829,
4879
+ "learning_rate": 1.52e-05,
4880
+ "loss": 0.0009,
4881
+ "step": 6960
4882
+ },
4883
+ {
4884
+ "epoch": 0.697,
4885
+ "grad_norm": 0.007784941233694553,
4886
+ "learning_rate": 1.515e-05,
4887
+ "loss": 0.0008,
4888
+ "step": 6970
4889
+ },
4890
+ {
4891
+ "epoch": 0.698,
4892
+ "grad_norm": 0.009864069521427155,
4893
+ "learning_rate": 1.51e-05,
4894
+ "loss": 0.0008,
4895
+ "step": 6980
4896
+ },
4897
+ {
4898
+ "epoch": 0.699,
4899
+ "grad_norm": 0.007372863125056028,
4900
+ "learning_rate": 1.505e-05,
4901
+ "loss": 0.0009,
4902
+ "step": 6990
4903
+ },
4904
+ {
4905
+ "epoch": 0.7,
4906
+ "grad_norm": 0.006507135462015867,
4907
+ "learning_rate": 1.5e-05,
4908
+ "loss": 0.0008,
4909
+ "step": 7000
4910
+ },
4911
+ {
4912
+ "epoch": 0.701,
4913
+ "grad_norm": 0.03093353845179081,
4914
+ "learning_rate": 1.4950000000000001e-05,
4915
+ "loss": 0.0014,
4916
+ "step": 7010
4917
+ },
4918
+ {
4919
+ "epoch": 0.702,
4920
+ "grad_norm": 0.01417300570756197,
4921
+ "learning_rate": 1.49e-05,
4922
+ "loss": 0.001,
4923
+ "step": 7020
4924
+ },
4925
+ {
4926
+ "epoch": 0.703,
4927
+ "grad_norm": 0.010836401022970676,
4928
+ "learning_rate": 1.485e-05,
4929
+ "loss": 0.0012,
4930
+ "step": 7030
4931
+ },
4932
+ {
4933
+ "epoch": 0.704,
4934
+ "grad_norm": 0.01000068336725235,
4935
+ "learning_rate": 1.48e-05,
4936
+ "loss": 0.001,
4937
+ "step": 7040
4938
+ },
4939
+ {
4940
+ "epoch": 0.705,
4941
+ "grad_norm": 0.008654952049255371,
4942
+ "learning_rate": 1.475e-05,
4943
+ "loss": 0.0009,
4944
+ "step": 7050
4945
+ },
4946
+ {
4947
+ "epoch": 0.706,
4948
+ "grad_norm": 0.010761331766843796,
4949
+ "learning_rate": 1.47e-05,
4950
+ "loss": 0.001,
4951
+ "step": 7060
4952
+ },
4953
+ {
4954
+ "epoch": 0.707,
4955
+ "grad_norm": 0.006188638508319855,
4956
+ "learning_rate": 1.465e-05,
4957
+ "loss": 0.0008,
4958
+ "step": 7070
4959
+ },
4960
+ {
4961
+ "epoch": 0.708,
4962
+ "grad_norm": 0.007858789525926113,
4963
+ "learning_rate": 1.4599999999999999e-05,
4964
+ "loss": 0.0008,
4965
+ "step": 7080
4966
+ },
4967
+ {
4968
+ "epoch": 0.709,
4969
+ "grad_norm": 0.02773350477218628,
4970
+ "learning_rate": 1.455e-05,
4971
+ "loss": 0.0014,
4972
+ "step": 7090
4973
+ },
4974
+ {
4975
+ "epoch": 0.71,
4976
+ "grad_norm": 0.012381108477711678,
4977
+ "learning_rate": 1.45e-05,
4978
+ "loss": 0.0009,
4979
+ "step": 7100
4980
+ },
4981
+ {
4982
+ "epoch": 0.711,
4983
+ "grad_norm": 0.009256324730813503,
4984
+ "learning_rate": 1.4449999999999999e-05,
4985
+ "loss": 0.0008,
4986
+ "step": 7110
4987
+ },
4988
+ {
4989
+ "epoch": 0.712,
4990
+ "grad_norm": 0.007005748804658651,
4991
+ "learning_rate": 1.44e-05,
4992
+ "loss": 0.0009,
4993
+ "step": 7120
4994
+ },
4995
+ {
4996
+ "epoch": 0.713,
4997
+ "grad_norm": 0.0055755749344825745,
4998
+ "learning_rate": 1.435e-05,
4999
+ "loss": 0.0007,
5000
+ "step": 7130
5001
+ },
5002
+ {
5003
+ "epoch": 0.714,
5004
+ "grad_norm": 0.003967254888266325,
5005
+ "learning_rate": 1.43e-05,
5006
+ "loss": 0.0008,
5007
+ "step": 7140
5008
+ },
5009
+ {
5010
+ "epoch": 0.715,
5011
+ "grad_norm": 0.0079165268689394,
5012
+ "learning_rate": 1.4249999999999999e-05,
5013
+ "loss": 0.0011,
5014
+ "step": 7150
5015
+ },
5016
+ {
5017
+ "epoch": 0.716,
5018
+ "grad_norm": 0.004682580940425396,
5019
+ "learning_rate": 1.42e-05,
5020
+ "loss": 0.0007,
5021
+ "step": 7160
5022
+ },
5023
+ {
5024
+ "epoch": 0.717,
5025
+ "grad_norm": 0.008578700013458729,
5026
+ "learning_rate": 1.415e-05,
5027
+ "loss": 0.0011,
5028
+ "step": 7170
5029
+ },
5030
+ {
5031
+ "epoch": 0.718,
5032
+ "grad_norm": 0.006943961605429649,
5033
+ "learning_rate": 1.4099999999999999e-05,
5034
+ "loss": 0.0009,
5035
+ "step": 7180
5036
+ },
5037
+ {
5038
+ "epoch": 0.719,
5039
+ "grad_norm": 0.0072656250558793545,
5040
+ "learning_rate": 1.4050000000000003e-05,
5041
+ "loss": 0.0007,
5042
+ "step": 7190
5043
+ },
5044
+ {
5045
+ "epoch": 0.72,
5046
+ "grad_norm": 0.005639955401420593,
5047
+ "learning_rate": 1.4000000000000001e-05,
5048
+ "loss": 0.0007,
5049
+ "step": 7200
5050
+ },
5051
+ {
5052
+ "epoch": 0.721,
5053
+ "grad_norm": 0.005733838304877281,
5054
+ "learning_rate": 1.3950000000000002e-05,
5055
+ "loss": 0.0008,
5056
+ "step": 7210
5057
+ },
5058
+ {
5059
+ "epoch": 0.722,
5060
+ "grad_norm": 0.02654002234339714,
5061
+ "learning_rate": 1.3900000000000002e-05,
5062
+ "loss": 0.0008,
5063
+ "step": 7220
5064
+ },
5065
+ {
5066
+ "epoch": 0.723,
5067
+ "grad_norm": 0.007308628410100937,
5068
+ "learning_rate": 1.3850000000000001e-05,
5069
+ "loss": 0.0008,
5070
+ "step": 7230
5071
+ },
5072
+ {
5073
+ "epoch": 0.724,
5074
+ "grad_norm": 0.006939894054085016,
5075
+ "learning_rate": 1.3800000000000002e-05,
5076
+ "loss": 0.0007,
5077
+ "step": 7240
5078
+ },
5079
+ {
5080
+ "epoch": 0.725,
5081
+ "grad_norm": 0.03964811936020851,
5082
+ "learning_rate": 1.3750000000000002e-05,
5083
+ "loss": 0.0013,
5084
+ "step": 7250
5085
+ },
5086
+ {
5087
+ "epoch": 0.726,
5088
+ "grad_norm": 0.014138396829366684,
5089
+ "learning_rate": 1.3700000000000001e-05,
5090
+ "loss": 0.001,
5091
+ "step": 7260
5092
+ },
5093
+ {
5094
+ "epoch": 0.727,
5095
+ "grad_norm": 0.008445181883871555,
5096
+ "learning_rate": 1.3650000000000001e-05,
5097
+ "loss": 0.0008,
5098
+ "step": 7270
5099
+ },
5100
+ {
5101
+ "epoch": 0.728,
5102
+ "grad_norm": 0.01134855579584837,
5103
+ "learning_rate": 1.3600000000000002e-05,
5104
+ "loss": 0.0009,
5105
+ "step": 7280
5106
+ },
5107
+ {
5108
+ "epoch": 0.729,
5109
+ "grad_norm": 0.010982022620737553,
5110
+ "learning_rate": 1.3550000000000002e-05,
5111
+ "loss": 0.0015,
5112
+ "step": 7290
5113
+ },
5114
+ {
5115
+ "epoch": 0.73,
5116
+ "grad_norm": 0.011698734015226364,
5117
+ "learning_rate": 1.3500000000000001e-05,
5118
+ "loss": 0.0008,
5119
+ "step": 7300
5120
+ },
5121
+ {
5122
+ "epoch": 0.731,
5123
+ "grad_norm": 0.006420729216188192,
5124
+ "learning_rate": 1.3450000000000002e-05,
5125
+ "loss": 0.0008,
5126
+ "step": 7310
5127
+ },
5128
+ {
5129
+ "epoch": 0.732,
5130
+ "grad_norm": 0.006088167428970337,
5131
+ "learning_rate": 1.3400000000000002e-05,
5132
+ "loss": 0.0008,
5133
+ "step": 7320
5134
+ },
5135
+ {
5136
+ "epoch": 0.733,
5137
+ "grad_norm": 0.0071141645312309265,
5138
+ "learning_rate": 1.3350000000000001e-05,
5139
+ "loss": 0.0012,
5140
+ "step": 7330
5141
+ },
5142
+ {
5143
+ "epoch": 0.734,
5144
+ "grad_norm": 0.004975921008735895,
5145
+ "learning_rate": 1.3300000000000001e-05,
5146
+ "loss": 0.0006,
5147
+ "step": 7340
5148
+ },
5149
+ {
5150
+ "epoch": 0.735,
5151
+ "grad_norm": 0.004499469883739948,
5152
+ "learning_rate": 1.3250000000000002e-05,
5153
+ "loss": 0.0007,
5154
+ "step": 7350
5155
+ },
5156
+ {
5157
+ "epoch": 0.736,
5158
+ "grad_norm": 0.009738982655107975,
5159
+ "learning_rate": 1.32e-05,
5160
+ "loss": 0.001,
5161
+ "step": 7360
5162
+ },
5163
+ {
5164
+ "epoch": 0.737,
5165
+ "grad_norm": 0.006863337475806475,
5166
+ "learning_rate": 1.3150000000000001e-05,
5167
+ "loss": 0.001,
5168
+ "step": 7370
5169
+ },
5170
+ {
5171
+ "epoch": 0.738,
5172
+ "grad_norm": 0.008216536603868008,
5173
+ "learning_rate": 1.3100000000000002e-05,
5174
+ "loss": 0.0007,
5175
+ "step": 7380
5176
+ },
5177
+ {
5178
+ "epoch": 0.739,
5179
+ "grad_norm": 0.006803369149565697,
5180
+ "learning_rate": 1.305e-05,
5181
+ "loss": 0.0008,
5182
+ "step": 7390
5183
+ },
5184
+ {
5185
+ "epoch": 0.74,
5186
+ "grad_norm": 0.00551017839461565,
5187
+ "learning_rate": 1.3000000000000001e-05,
5188
+ "loss": 0.0008,
5189
+ "step": 7400
5190
+ },
5191
+ {
5192
+ "epoch": 0.741,
5193
+ "grad_norm": 0.009463651105761528,
5194
+ "learning_rate": 1.2950000000000001e-05,
5195
+ "loss": 0.0008,
5196
+ "step": 7410
5197
+ },
5198
+ {
5199
+ "epoch": 0.742,
5200
+ "grad_norm": 0.01233983039855957,
5201
+ "learning_rate": 1.29e-05,
5202
+ "loss": 0.0019,
5203
+ "step": 7420
5204
+ },
5205
+ {
5206
+ "epoch": 0.743,
5207
+ "grad_norm": 0.008470877073705196,
5208
+ "learning_rate": 1.285e-05,
5209
+ "loss": 0.0009,
5210
+ "step": 7430
5211
+ },
5212
+ {
5213
+ "epoch": 0.744,
5214
+ "grad_norm": 0.007592742796987295,
5215
+ "learning_rate": 1.2800000000000001e-05,
5216
+ "loss": 0.0008,
5217
+ "step": 7440
5218
+ },
5219
+ {
5220
+ "epoch": 0.745,
5221
+ "grad_norm": 0.03596987947821617,
5222
+ "learning_rate": 1.2750000000000002e-05,
5223
+ "loss": 0.001,
5224
+ "step": 7450
5225
+ },
5226
+ {
5227
+ "epoch": 0.746,
5228
+ "grad_norm": 0.005849502049386501,
5229
+ "learning_rate": 1.27e-05,
5230
+ "loss": 0.0008,
5231
+ "step": 7460
5232
+ },
5233
+ {
5234
+ "epoch": 0.747,
5235
+ "grad_norm": 0.009035659022629261,
5236
+ "learning_rate": 1.2650000000000001e-05,
5237
+ "loss": 0.0007,
5238
+ "step": 7470
5239
+ },
5240
+ {
5241
+ "epoch": 0.748,
5242
+ "grad_norm": 0.010397679172456264,
5243
+ "learning_rate": 1.2600000000000001e-05,
5244
+ "loss": 0.0014,
5245
+ "step": 7480
5246
+ },
5247
+ {
5248
+ "epoch": 0.749,
5249
+ "grad_norm": 0.014514378271996975,
5250
+ "learning_rate": 1.255e-05,
5251
+ "loss": 0.0008,
5252
+ "step": 7490
5253
+ },
5254
+ {
5255
+ "epoch": 0.75,
5256
+ "grad_norm": 0.004837281536310911,
5257
+ "learning_rate": 1.25e-05,
5258
+ "loss": 0.0006,
5259
+ "step": 7500
5260
+ },
5261
+ {
5262
+ "epoch": 0.751,
5263
+ "grad_norm": 0.007720770314335823,
5264
+ "learning_rate": 1.2450000000000001e-05,
5265
+ "loss": 0.0006,
5266
+ "step": 7510
5267
+ },
5268
+ {
5269
+ "epoch": 0.752,
5270
+ "grad_norm": 0.012046804651618004,
5271
+ "learning_rate": 1.24e-05,
5272
+ "loss": 0.0011,
5273
+ "step": 7520
5274
+ },
5275
+ {
5276
+ "epoch": 0.753,
5277
+ "grad_norm": 0.01343387458473444,
5278
+ "learning_rate": 1.235e-05,
5279
+ "loss": 0.0007,
5280
+ "step": 7530
5281
+ },
5282
+ {
5283
+ "epoch": 0.754,
5284
+ "grad_norm": 0.00810600072145462,
5285
+ "learning_rate": 1.23e-05,
5286
+ "loss": 0.0007,
5287
+ "step": 7540
5288
+ },
5289
+ {
5290
+ "epoch": 0.755,
5291
+ "grad_norm": 0.00925883837044239,
5292
+ "learning_rate": 1.225e-05,
5293
+ "loss": 0.0006,
5294
+ "step": 7550
5295
+ },
5296
+ {
5297
+ "epoch": 0.756,
5298
+ "grad_norm": 0.01927885413169861,
5299
+ "learning_rate": 1.22e-05,
5300
+ "loss": 0.0014,
5301
+ "step": 7560
5302
+ },
5303
+ {
5304
+ "epoch": 0.757,
5305
+ "grad_norm": 0.010129665955901146,
5306
+ "learning_rate": 1.215e-05,
5307
+ "loss": 0.0006,
5308
+ "step": 7570
5309
+ },
5310
+ {
5311
+ "epoch": 0.758,
5312
+ "grad_norm": 0.007863885723054409,
5313
+ "learning_rate": 1.2100000000000001e-05,
5314
+ "loss": 0.0006,
5315
+ "step": 7580
5316
+ },
5317
+ {
5318
+ "epoch": 0.759,
5319
+ "grad_norm": 0.005500464700162411,
5320
+ "learning_rate": 1.205e-05,
5321
+ "loss": 0.0007,
5322
+ "step": 7590
5323
+ },
5324
+ {
5325
+ "epoch": 0.76,
5326
+ "grad_norm": 0.0040563903748989105,
5327
+ "learning_rate": 1.2e-05,
5328
+ "loss": 0.0006,
5329
+ "step": 7600
5330
+ },
5331
+ {
5332
+ "epoch": 0.761,
5333
+ "grad_norm": 0.006361998151987791,
5334
+ "learning_rate": 1.195e-05,
5335
+ "loss": 0.0007,
5336
+ "step": 7610
5337
+ },
5338
+ {
5339
+ "epoch": 0.762,
5340
+ "grad_norm": 0.0136310625821352,
5341
+ "learning_rate": 1.19e-05,
5342
+ "loss": 0.0008,
5343
+ "step": 7620
5344
+ },
5345
+ {
5346
+ "epoch": 0.763,
5347
+ "grad_norm": 0.005384715739637613,
5348
+ "learning_rate": 1.185e-05,
5349
+ "loss": 0.0007,
5350
+ "step": 7630
5351
+ },
5352
+ {
5353
+ "epoch": 0.764,
5354
+ "grad_norm": 0.014707676135003567,
5355
+ "learning_rate": 1.18e-05,
5356
+ "loss": 0.0007,
5357
+ "step": 7640
5358
+ },
5359
+ {
5360
+ "epoch": 0.765,
5361
+ "grad_norm": 0.008092684671282768,
5362
+ "learning_rate": 1.175e-05,
5363
+ "loss": 0.0006,
5364
+ "step": 7650
5365
+ },
5366
+ {
5367
+ "epoch": 0.766,
5368
+ "grad_norm": 0.007185132242739201,
5369
+ "learning_rate": 1.1700000000000001e-05,
5370
+ "loss": 0.0006,
5371
+ "step": 7660
5372
+ },
5373
+ {
5374
+ "epoch": 0.767,
5375
+ "grad_norm": 0.005672789178788662,
5376
+ "learning_rate": 1.1650000000000002e-05,
5377
+ "loss": 0.0006,
5378
+ "step": 7670
5379
+ },
5380
+ {
5381
+ "epoch": 0.768,
5382
+ "grad_norm": 0.05434956029057503,
5383
+ "learning_rate": 1.16e-05,
5384
+ "loss": 0.001,
5385
+ "step": 7680
5386
+ },
5387
+ {
5388
+ "epoch": 0.769,
5389
+ "grad_norm": 0.00933472067117691,
5390
+ "learning_rate": 1.1550000000000001e-05,
5391
+ "loss": 0.0007,
5392
+ "step": 7690
5393
+ },
5394
+ {
5395
+ "epoch": 0.77,
5396
+ "grad_norm": 0.008684621192514896,
5397
+ "learning_rate": 1.1500000000000002e-05,
5398
+ "loss": 0.0006,
5399
+ "step": 7700
5400
+ },
5401
+ {
5402
+ "epoch": 0.771,
5403
+ "grad_norm": 0.03054739721119404,
5404
+ "learning_rate": 1.145e-05,
5405
+ "loss": 0.0006,
5406
+ "step": 7710
5407
+ },
5408
+ {
5409
+ "epoch": 0.772,
5410
+ "grad_norm": 0.005998207256197929,
5411
+ "learning_rate": 1.1400000000000001e-05,
5412
+ "loss": 0.0006,
5413
+ "step": 7720
5414
+ },
5415
+ {
5416
+ "epoch": 0.773,
5417
+ "grad_norm": 0.006153833121061325,
5418
+ "learning_rate": 1.1350000000000001e-05,
5419
+ "loss": 0.0006,
5420
+ "step": 7730
5421
+ },
5422
+ {
5423
+ "epoch": 0.774,
5424
+ "grad_norm": 0.007491481024771929,
5425
+ "learning_rate": 1.13e-05,
5426
+ "loss": 0.0007,
5427
+ "step": 7740
5428
+ },
5429
+ {
5430
+ "epoch": 0.775,
5431
+ "grad_norm": 0.01078925933688879,
5432
+ "learning_rate": 1.125e-05,
5433
+ "loss": 0.0006,
5434
+ "step": 7750
5435
+ },
5436
+ {
5437
+ "epoch": 0.776,
5438
+ "grad_norm": 0.005885554943233728,
5439
+ "learning_rate": 1.1200000000000001e-05,
5440
+ "loss": 0.0006,
5441
+ "step": 7760
5442
+ },
5443
+ {
5444
+ "epoch": 0.777,
5445
+ "grad_norm": 0.005423078313469887,
5446
+ "learning_rate": 1.115e-05,
5447
+ "loss": 0.0007,
5448
+ "step": 7770
5449
+ },
5450
+ {
5451
+ "epoch": 0.778,
5452
+ "grad_norm": 0.008044522255659103,
5453
+ "learning_rate": 1.11e-05,
5454
+ "loss": 0.0006,
5455
+ "step": 7780
5456
+ },
5457
+ {
5458
+ "epoch": 0.779,
5459
+ "grad_norm": 0.00733207818120718,
5460
+ "learning_rate": 1.1050000000000001e-05,
5461
+ "loss": 0.0007,
5462
+ "step": 7790
5463
+ },
5464
+ {
5465
+ "epoch": 0.78,
5466
+ "grad_norm": 0.0066906120628118515,
5467
+ "learning_rate": 1.1000000000000001e-05,
5468
+ "loss": 0.0009,
5469
+ "step": 7800
5470
+ },
5471
+ {
5472
+ "epoch": 0.781,
5473
+ "grad_norm": 0.004443836398422718,
5474
+ "learning_rate": 1.095e-05,
5475
+ "loss": 0.0006,
5476
+ "step": 7810
5477
+ },
5478
+ {
5479
+ "epoch": 0.782,
5480
+ "grad_norm": 0.0058379145339131355,
5481
+ "learning_rate": 1.09e-05,
5482
+ "loss": 0.0007,
5483
+ "step": 7820
5484
+ },
5485
+ {
5486
+ "epoch": 0.783,
5487
+ "grad_norm": 0.006808693055063486,
5488
+ "learning_rate": 1.0850000000000001e-05,
5489
+ "loss": 0.0006,
5490
+ "step": 7830
5491
+ },
5492
+ {
5493
+ "epoch": 0.784,
5494
+ "grad_norm": 0.008773542940616608,
5495
+ "learning_rate": 1.08e-05,
5496
+ "loss": 0.0006,
5497
+ "step": 7840
5498
+ },
5499
+ {
5500
+ "epoch": 0.785,
5501
+ "grad_norm": 0.006700740661472082,
5502
+ "learning_rate": 1.075e-05,
5503
+ "loss": 0.0006,
5504
+ "step": 7850
5505
+ },
5506
+ {
5507
+ "epoch": 0.786,
5508
+ "grad_norm": 0.00906393863260746,
5509
+ "learning_rate": 1.0700000000000001e-05,
5510
+ "loss": 0.0006,
5511
+ "step": 7860
5512
+ },
5513
+ {
5514
+ "epoch": 0.787,
5515
+ "grad_norm": 0.0030822190456092358,
5516
+ "learning_rate": 1.065e-05,
5517
+ "loss": 0.0005,
5518
+ "step": 7870
5519
+ },
5520
+ {
5521
+ "epoch": 0.788,
5522
+ "grad_norm": 0.0029632148798555136,
5523
+ "learning_rate": 1.06e-05,
5524
+ "loss": 0.0005,
5525
+ "step": 7880
5526
+ },
5527
+ {
5528
+ "epoch": 0.789,
5529
+ "grad_norm": 0.004798842128366232,
5530
+ "learning_rate": 1.055e-05,
5531
+ "loss": 0.0006,
5532
+ "step": 7890
5533
+ },
5534
+ {
5535
+ "epoch": 0.79,
5536
+ "grad_norm": 0.007376812864094973,
5537
+ "learning_rate": 1.05e-05,
5538
+ "loss": 0.0005,
5539
+ "step": 7900
5540
+ },
5541
+ {
5542
+ "epoch": 0.791,
5543
+ "grad_norm": 0.009337624534964561,
5544
+ "learning_rate": 1.045e-05,
5545
+ "loss": 0.0009,
5546
+ "step": 7910
5547
+ },
5548
+ {
5549
+ "epoch": 0.792,
5550
+ "grad_norm": 0.012847904115915298,
5551
+ "learning_rate": 1.04e-05,
5552
+ "loss": 0.0008,
5553
+ "step": 7920
5554
+ },
5555
+ {
5556
+ "epoch": 0.793,
5557
+ "grad_norm": 0.005587203428149223,
5558
+ "learning_rate": 1.035e-05,
5559
+ "loss": 0.0006,
5560
+ "step": 7930
5561
+ },
5562
+ {
5563
+ "epoch": 0.794,
5564
+ "grad_norm": 0.008464600890874863,
5565
+ "learning_rate": 1.03e-05,
5566
+ "loss": 0.0006,
5567
+ "step": 7940
5568
+ },
5569
+ {
5570
+ "epoch": 0.795,
5571
+ "grad_norm": 0.2516852617263794,
5572
+ "learning_rate": 1.025e-05,
5573
+ "loss": 0.002,
5574
+ "step": 7950
5575
+ },
5576
+ {
5577
+ "epoch": 0.796,
5578
+ "grad_norm": 0.04664693772792816,
5579
+ "learning_rate": 1.02e-05,
5580
+ "loss": 0.002,
5581
+ "step": 7960
5582
+ },
5583
+ {
5584
+ "epoch": 0.797,
5585
+ "grad_norm": 0.02456306852400303,
5586
+ "learning_rate": 1.0150000000000001e-05,
5587
+ "loss": 0.0013,
5588
+ "step": 7970
5589
+ },
5590
+ {
5591
+ "epoch": 0.798,
5592
+ "grad_norm": 0.011320951394736767,
5593
+ "learning_rate": 1.0100000000000002e-05,
5594
+ "loss": 0.0009,
5595
+ "step": 7980
5596
+ },
5597
+ {
5598
+ "epoch": 0.799,
5599
+ "grad_norm": 0.01860683411359787,
5600
+ "learning_rate": 1.005e-05,
5601
+ "loss": 0.0012,
5602
+ "step": 7990
5603
+ },
5604
+ {
5605
+ "epoch": 0.8,
5606
+ "grad_norm": 0.03227970749139786,
5607
+ "learning_rate": 1e-05,
5608
+ "loss": 0.0009,
5609
+ "step": 8000
5610
+ },
5611
+ {
5612
+ "epoch": 0.801,
5613
+ "grad_norm": 0.015873363241553307,
5614
+ "learning_rate": 9.950000000000001e-06,
5615
+ "loss": 0.0008,
5616
+ "step": 8010
5617
+ },
5618
+ {
5619
+ "epoch": 0.802,
5620
+ "grad_norm": 0.005454899277538061,
5621
+ "learning_rate": 9.900000000000002e-06,
5622
+ "loss": 0.0008,
5623
+ "step": 8020
5624
+ },
5625
+ {
5626
+ "epoch": 0.803,
5627
+ "grad_norm": 0.007948348298668861,
5628
+ "learning_rate": 9.85e-06,
5629
+ "loss": 0.0007,
5630
+ "step": 8030
5631
+ },
5632
+ {
5633
+ "epoch": 0.804,
5634
+ "grad_norm": 0.013328757137060165,
5635
+ "learning_rate": 9.800000000000001e-06,
5636
+ "loss": 0.0006,
5637
+ "step": 8040
5638
+ },
5639
+ {
5640
+ "epoch": 0.805,
5641
+ "grad_norm": 0.01018743496388197,
5642
+ "learning_rate": 9.750000000000002e-06,
5643
+ "loss": 0.0012,
5644
+ "step": 8050
5645
+ },
5646
+ {
5647
+ "epoch": 0.806,
5648
+ "grad_norm": 0.009421809576451778,
5649
+ "learning_rate": 9.7e-06,
5650
+ "loss": 0.0008,
5651
+ "step": 8060
5652
+ },
5653
+ {
5654
+ "epoch": 0.807,
5655
+ "grad_norm": 0.005202045664191246,
5656
+ "learning_rate": 9.65e-06,
5657
+ "loss": 0.0007,
5658
+ "step": 8070
5659
+ },
5660
+ {
5661
+ "epoch": 0.808,
5662
+ "grad_norm": 0.012956002727150917,
5663
+ "learning_rate": 9.600000000000001e-06,
5664
+ "loss": 0.0007,
5665
+ "step": 8080
5666
+ },
5667
+ {
5668
+ "epoch": 0.809,
5669
+ "grad_norm": 0.006403383333235979,
5670
+ "learning_rate": 9.55e-06,
5671
+ "loss": 0.0007,
5672
+ "step": 8090
5673
+ },
5674
+ {
5675
+ "epoch": 0.81,
5676
+ "grad_norm": 0.027560915797948837,
5677
+ "learning_rate": 9.5e-06,
5678
+ "loss": 0.0008,
5679
+ "step": 8100
5680
+ },
5681
+ {
5682
+ "epoch": 0.811,
5683
+ "grad_norm": 0.005196988116949797,
5684
+ "learning_rate": 9.450000000000001e-06,
5685
+ "loss": 0.0006,
5686
+ "step": 8110
5687
+ },
5688
+ {
5689
+ "epoch": 0.812,
5690
+ "grad_norm": 0.009510821662843227,
5691
+ "learning_rate": 9.4e-06,
5692
+ "loss": 0.0006,
5693
+ "step": 8120
5694
+ },
5695
+ {
5696
+ "epoch": 0.813,
5697
+ "grad_norm": 0.006430651992559433,
5698
+ "learning_rate": 9.35e-06,
5699
+ "loss": 0.0006,
5700
+ "step": 8130
5701
+ },
5702
+ {
5703
+ "epoch": 0.814,
5704
+ "grad_norm": 0.019426727667450905,
5705
+ "learning_rate": 9.3e-06,
5706
+ "loss": 0.0009,
5707
+ "step": 8140
5708
+ },
5709
+ {
5710
+ "epoch": 0.815,
5711
+ "grad_norm": 0.011564865708351135,
5712
+ "learning_rate": 9.25e-06,
5713
+ "loss": 0.0006,
5714
+ "step": 8150
5715
+ },
5716
+ {
5717
+ "epoch": 0.816,
5718
+ "grad_norm": 0.009036659263074398,
5719
+ "learning_rate": 9.2e-06,
5720
+ "loss": 0.0008,
5721
+ "step": 8160
5722
+ },
5723
+ {
5724
+ "epoch": 0.817,
5725
+ "grad_norm": 0.006685588974505663,
5726
+ "learning_rate": 9.15e-06,
5727
+ "loss": 0.0007,
5728
+ "step": 8170
5729
+ },
5730
+ {
5731
+ "epoch": 0.818,
5732
+ "grad_norm": 0.005980687215924263,
5733
+ "learning_rate": 9.100000000000001e-06,
5734
+ "loss": 0.0005,
5735
+ "step": 8180
5736
+ },
5737
+ {
5738
+ "epoch": 0.819,
5739
+ "grad_norm": 0.0029402158688753843,
5740
+ "learning_rate": 9.05e-06,
5741
+ "loss": 0.0005,
5742
+ "step": 8190
5743
+ },
5744
+ {
5745
+ "epoch": 0.82,
5746
+ "grad_norm": 0.0034720194526016712,
5747
+ "learning_rate": 9e-06,
5748
+ "loss": 0.0006,
5749
+ "step": 8200
5750
+ },
5751
+ {
5752
+ "epoch": 0.821,
5753
+ "grad_norm": 0.008967465721070766,
5754
+ "learning_rate": 8.95e-06,
5755
+ "loss": 0.0009,
5756
+ "step": 8210
5757
+ },
5758
+ {
5759
+ "epoch": 0.822,
5760
+ "grad_norm": 0.007418784312903881,
5761
+ "learning_rate": 8.9e-06,
5762
+ "loss": 0.0007,
5763
+ "step": 8220
5764
+ },
5765
+ {
5766
+ "epoch": 0.823,
5767
+ "grad_norm": 0.0077253603376448154,
5768
+ "learning_rate": 8.85e-06,
5769
+ "loss": 0.0006,
5770
+ "step": 8230
5771
+ },
5772
+ {
5773
+ "epoch": 0.824,
5774
+ "grad_norm": 0.011202674359083176,
5775
+ "learning_rate": 8.8e-06,
5776
+ "loss": 0.0013,
5777
+ "step": 8240
5778
+ },
5779
+ {
5780
+ "epoch": 0.825,
5781
+ "grad_norm": 0.022354573011398315,
5782
+ "learning_rate": 8.75e-06,
5783
+ "loss": 0.0008,
5784
+ "step": 8250
5785
+ },
5786
+ {
5787
+ "epoch": 0.826,
5788
+ "grad_norm": 0.01750505343079567,
5789
+ "learning_rate": 8.7e-06,
5790
+ "loss": 0.0013,
5791
+ "step": 8260
5792
+ },
5793
+ {
5794
+ "epoch": 0.827,
5795
+ "grad_norm": 0.01153852604329586,
5796
+ "learning_rate": 8.65e-06,
5797
+ "loss": 0.0009,
5798
+ "step": 8270
5799
+ },
5800
+ {
5801
+ "epoch": 0.828,
5802
+ "grad_norm": 0.008752427063882351,
5803
+ "learning_rate": 8.599999999999999e-06,
5804
+ "loss": 0.0006,
5805
+ "step": 8280
5806
+ },
5807
+ {
5808
+ "epoch": 0.829,
5809
+ "grad_norm": 0.007307702675461769,
5810
+ "learning_rate": 8.550000000000001e-06,
5811
+ "loss": 0.0007,
5812
+ "step": 8290
5813
+ },
5814
+ {
5815
+ "epoch": 0.83,
5816
+ "grad_norm": 0.0077101094648242,
5817
+ "learning_rate": 8.500000000000002e-06,
5818
+ "loss": 0.0006,
5819
+ "step": 8300
5820
+ },
5821
+ {
5822
+ "epoch": 0.831,
5823
+ "grad_norm": 0.006358897779136896,
5824
+ "learning_rate": 8.45e-06,
5825
+ "loss": 0.0005,
5826
+ "step": 8310
5827
+ },
5828
+ {
5829
+ "epoch": 0.832,
5830
+ "grad_norm": 0.003663134528324008,
5831
+ "learning_rate": 8.400000000000001e-06,
5832
+ "loss": 0.0006,
5833
+ "step": 8320
5834
+ },
5835
+ {
5836
+ "epoch": 0.833,
5837
+ "grad_norm": 0.005117372144013643,
5838
+ "learning_rate": 8.350000000000001e-06,
5839
+ "loss": 0.0006,
5840
+ "step": 8330
5841
+ },
5842
+ {
5843
+ "epoch": 0.834,
5844
+ "grad_norm": 0.004245636984705925,
5845
+ "learning_rate": 8.3e-06,
5846
+ "loss": 0.0005,
5847
+ "step": 8340
5848
+ },
5849
+ {
5850
+ "epoch": 0.835,
5851
+ "grad_norm": 0.005357146263122559,
5852
+ "learning_rate": 8.25e-06,
5853
+ "loss": 0.0006,
5854
+ "step": 8350
5855
+ },
5856
+ {
5857
+ "epoch": 0.836,
5858
+ "grad_norm": 0.01055213250219822,
5859
+ "learning_rate": 8.200000000000001e-06,
5860
+ "loss": 0.0008,
5861
+ "step": 8360
5862
+ },
5863
+ {
5864
+ "epoch": 0.837,
5865
+ "grad_norm": 0.01871907152235508,
5866
+ "learning_rate": 8.15e-06,
5867
+ "loss": 0.0007,
5868
+ "step": 8370
5869
+ },
5870
+ {
5871
+ "epoch": 0.838,
5872
+ "grad_norm": 0.013110162690281868,
5873
+ "learning_rate": 8.1e-06,
5874
+ "loss": 0.0005,
5875
+ "step": 8380
5876
+ },
5877
+ {
5878
+ "epoch": 0.839,
5879
+ "grad_norm": 0.005271353758871555,
5880
+ "learning_rate": 8.050000000000001e-06,
5881
+ "loss": 0.0007,
5882
+ "step": 8390
5883
+ },
5884
+ {
5885
+ "epoch": 0.84,
5886
+ "grad_norm": 0.004324494861066341,
5887
+ "learning_rate": 8.000000000000001e-06,
5888
+ "loss": 0.0005,
5889
+ "step": 8400
5890
+ },
5891
+ {
5892
+ "epoch": 0.841,
5893
+ "grad_norm": 0.0031851409003138542,
5894
+ "learning_rate": 7.95e-06,
5895
+ "loss": 0.0006,
5896
+ "step": 8410
5897
+ },
5898
+ {
5899
+ "epoch": 0.842,
5900
+ "grad_norm": 0.009736557491123676,
5901
+ "learning_rate": 7.9e-06,
5902
+ "loss": 0.0006,
5903
+ "step": 8420
5904
+ },
5905
+ {
5906
+ "epoch": 0.843,
5907
+ "grad_norm": 0.005168536212295294,
5908
+ "learning_rate": 7.850000000000001e-06,
5909
+ "loss": 0.0005,
5910
+ "step": 8430
5911
+ },
5912
+ {
5913
+ "epoch": 0.844,
5914
+ "grad_norm": 0.002579685300588608,
5915
+ "learning_rate": 7.8e-06,
5916
+ "loss": 0.0005,
5917
+ "step": 8440
5918
+ },
5919
+ {
5920
+ "epoch": 0.845,
5921
+ "grad_norm": 0.008710252121090889,
5922
+ "learning_rate": 7.75e-06,
5923
+ "loss": 0.0005,
5924
+ "step": 8450
5925
+ },
5926
+ {
5927
+ "epoch": 0.846,
5928
+ "grad_norm": 0.004952189512550831,
5929
+ "learning_rate": 7.7e-06,
5930
+ "loss": 0.0008,
5931
+ "step": 8460
5932
+ },
5933
+ {
5934
+ "epoch": 0.847,
5935
+ "grad_norm": 0.003375423140823841,
5936
+ "learning_rate": 7.65e-06,
5937
+ "loss": 0.0005,
5938
+ "step": 8470
5939
+ },
5940
+ {
5941
+ "epoch": 0.848,
5942
+ "grad_norm": 0.13184253871440887,
5943
+ "learning_rate": 7.6e-06,
5944
+ "loss": 0.0012,
5945
+ "step": 8480
5946
+ },
5947
+ {
5948
+ "epoch": 0.849,
5949
+ "grad_norm": 0.017549166455864906,
5950
+ "learning_rate": 7.55e-06,
5951
+ "loss": 0.0007,
5952
+ "step": 8490
5953
+ },
5954
+ {
5955
+ "epoch": 0.85,
5956
+ "grad_norm": 0.00852286908775568,
5957
+ "learning_rate": 7.5e-06,
5958
+ "loss": 0.0006,
5959
+ "step": 8500
5960
+ },
5961
+ {
5962
+ "epoch": 0.851,
5963
+ "grad_norm": 0.005547389388084412,
5964
+ "learning_rate": 7.45e-06,
5965
+ "loss": 0.0005,
5966
+ "step": 8510
5967
+ },
5968
+ {
5969
+ "epoch": 0.852,
5970
+ "grad_norm": 0.0061622606590390205,
5971
+ "learning_rate": 7.4e-06,
5972
+ "loss": 0.0005,
5973
+ "step": 8520
5974
+ },
5975
+ {
5976
+ "epoch": 0.853,
5977
+ "grad_norm": 0.005182339809834957,
5978
+ "learning_rate": 7.35e-06,
5979
+ "loss": 0.0008,
5980
+ "step": 8530
5981
+ },
5982
+ {
5983
+ "epoch": 0.854,
5984
+ "grad_norm": 0.005366960074752569,
5985
+ "learning_rate": 7.2999999999999996e-06,
5986
+ "loss": 0.0006,
5987
+ "step": 8540
5988
+ },
5989
+ {
5990
+ "epoch": 0.855,
5991
+ "grad_norm": 0.005542315077036619,
5992
+ "learning_rate": 7.25e-06,
5993
+ "loss": 0.0006,
5994
+ "step": 8550
5995
+ },
5996
+ {
5997
+ "epoch": 0.856,
5998
+ "grad_norm": 0.003940809518098831,
5999
+ "learning_rate": 7.2e-06,
6000
+ "loss": 0.0005,
6001
+ "step": 8560
6002
+ },
6003
+ {
6004
+ "epoch": 0.857,
6005
+ "grad_norm": 0.003730529686436057,
6006
+ "learning_rate": 7.15e-06,
6007
+ "loss": 0.0006,
6008
+ "step": 8570
6009
+ },
6010
+ {
6011
+ "epoch": 0.858,
6012
+ "grad_norm": 0.0033961348235607147,
6013
+ "learning_rate": 7.1e-06,
6014
+ "loss": 0.0005,
6015
+ "step": 8580
6016
+ },
6017
+ {
6018
+ "epoch": 0.859,
6019
+ "grad_norm": 0.004546662792563438,
6020
+ "learning_rate": 7.049999999999999e-06,
6021
+ "loss": 0.0006,
6022
+ "step": 8590
6023
+ },
6024
+ {
6025
+ "epoch": 0.86,
6026
+ "grad_norm": 0.009168008342385292,
6027
+ "learning_rate": 7.000000000000001e-06,
6028
+ "loss": 0.0005,
6029
+ "step": 8600
6030
+ },
6031
+ {
6032
+ "epoch": 0.861,
6033
+ "grad_norm": 0.008373426273465157,
6034
+ "learning_rate": 6.950000000000001e-06,
6035
+ "loss": 0.0008,
6036
+ "step": 8610
6037
+ },
6038
+ {
6039
+ "epoch": 0.862,
6040
+ "grad_norm": 0.004947313107550144,
6041
+ "learning_rate": 6.900000000000001e-06,
6042
+ "loss": 0.0006,
6043
+ "step": 8620
6044
+ },
6045
+ {
6046
+ "epoch": 0.863,
6047
+ "grad_norm": 0.015127859078347683,
6048
+ "learning_rate": 6.8500000000000005e-06,
6049
+ "loss": 0.0006,
6050
+ "step": 8630
6051
+ },
6052
+ {
6053
+ "epoch": 0.864,
6054
+ "grad_norm": 0.0056435600854456425,
6055
+ "learning_rate": 6.800000000000001e-06,
6056
+ "loss": 0.0006,
6057
+ "step": 8640
6058
+ },
6059
+ {
6060
+ "epoch": 0.865,
6061
+ "grad_norm": 0.004109732341021299,
6062
+ "learning_rate": 6.750000000000001e-06,
6063
+ "loss": 0.0005,
6064
+ "step": 8650
6065
+ },
6066
+ {
6067
+ "epoch": 0.866,
6068
+ "grad_norm": 0.006170314736664295,
6069
+ "learning_rate": 6.700000000000001e-06,
6070
+ "loss": 0.0005,
6071
+ "step": 8660
6072
+ },
6073
+ {
6074
+ "epoch": 0.867,
6075
+ "grad_norm": 0.002802550094202161,
6076
+ "learning_rate": 6.650000000000001e-06,
6077
+ "loss": 0.0005,
6078
+ "step": 8670
6079
+ },
6080
+ {
6081
+ "epoch": 0.868,
6082
+ "grad_norm": 0.0029788350220769644,
6083
+ "learning_rate": 6.6e-06,
6084
+ "loss": 0.0004,
6085
+ "step": 8680
6086
+ },
6087
+ {
6088
+ "epoch": 0.869,
6089
+ "grad_norm": 0.013022363185882568,
6090
+ "learning_rate": 6.550000000000001e-06,
6091
+ "loss": 0.0006,
6092
+ "step": 8690
6093
+ },
6094
+ {
6095
+ "epoch": 0.87,
6096
+ "grad_norm": 0.0036853367928415537,
6097
+ "learning_rate": 6.5000000000000004e-06,
6098
+ "loss": 0.0006,
6099
+ "step": 8700
6100
+ },
6101
+ {
6102
+ "epoch": 0.871,
6103
+ "grad_norm": 0.002578242914751172,
6104
+ "learning_rate": 6.45e-06,
6105
+ "loss": 0.0005,
6106
+ "step": 8710
6107
+ },
6108
+ {
6109
+ "epoch": 0.872,
6110
+ "grad_norm": 0.0036895396187901497,
6111
+ "learning_rate": 6.4000000000000006e-06,
6112
+ "loss": 0.0005,
6113
+ "step": 8720
6114
+ },
6115
+ {
6116
+ "epoch": 0.873,
6117
+ "grad_norm": 0.006020987406373024,
6118
+ "learning_rate": 6.35e-06,
6119
+ "loss": 0.0005,
6120
+ "step": 8730
6121
+ },
6122
+ {
6123
+ "epoch": 0.874,
6124
+ "grad_norm": 0.006671608425676823,
6125
+ "learning_rate": 6.300000000000001e-06,
6126
+ "loss": 0.0006,
6127
+ "step": 8740
6128
+ },
6129
+ {
6130
+ "epoch": 0.875,
6131
+ "grad_norm": 0.0038102639373391867,
6132
+ "learning_rate": 6.25e-06,
6133
+ "loss": 0.0006,
6134
+ "step": 8750
6135
+ },
6136
+ {
6137
+ "epoch": 0.876,
6138
+ "grad_norm": 0.006786294747143984,
6139
+ "learning_rate": 6.2e-06,
6140
+ "loss": 0.0004,
6141
+ "step": 8760
6142
+ },
6143
+ {
6144
+ "epoch": 0.877,
6145
+ "grad_norm": 0.00381205091252923,
6146
+ "learning_rate": 6.15e-06,
6147
+ "loss": 0.0004,
6148
+ "step": 8770
6149
+ },
6150
+ {
6151
+ "epoch": 0.878,
6152
+ "grad_norm": 0.007368630729615688,
6153
+ "learning_rate": 6.1e-06,
6154
+ "loss": 0.0005,
6155
+ "step": 8780
6156
+ },
6157
+ {
6158
+ "epoch": 0.879,
6159
+ "grad_norm": 0.0035172586794942617,
6160
+ "learning_rate": 6.0500000000000005e-06,
6161
+ "loss": 0.0006,
6162
+ "step": 8790
6163
+ },
6164
+ {
6165
+ "epoch": 0.88,
6166
+ "grad_norm": 0.005555720068514347,
6167
+ "learning_rate": 6e-06,
6168
+ "loss": 0.0007,
6169
+ "step": 8800
6170
+ },
6171
+ {
6172
+ "epoch": 0.881,
6173
+ "grad_norm": 0.0076825893484056,
6174
+ "learning_rate": 5.95e-06,
6175
+ "loss": 0.0005,
6176
+ "step": 8810
6177
+ },
6178
+ {
6179
+ "epoch": 0.882,
6180
+ "grad_norm": 0.0055446140468120575,
6181
+ "learning_rate": 5.9e-06,
6182
+ "loss": 0.0005,
6183
+ "step": 8820
6184
+ },
6185
+ {
6186
+ "epoch": 0.883,
6187
+ "grad_norm": 0.002265618182718754,
6188
+ "learning_rate": 5.850000000000001e-06,
6189
+ "loss": 0.0005,
6190
+ "step": 8830
6191
+ },
6192
+ {
6193
+ "epoch": 0.884,
6194
+ "grad_norm": 0.003428585361689329,
6195
+ "learning_rate": 5.8e-06,
6196
+ "loss": 0.0004,
6197
+ "step": 8840
6198
+ },
6199
+ {
6200
+ "epoch": 0.885,
6201
+ "grad_norm": 0.0044764927588403225,
6202
+ "learning_rate": 5.750000000000001e-06,
6203
+ "loss": 0.0005,
6204
+ "step": 8850
6205
+ },
6206
+ {
6207
+ "epoch": 0.886,
6208
+ "grad_norm": 0.003201392712071538,
6209
+ "learning_rate": 5.7000000000000005e-06,
6210
+ "loss": 0.0005,
6211
+ "step": 8860
6212
+ },
6213
+ {
6214
+ "epoch": 0.887,
6215
+ "grad_norm": 0.0029762780759483576,
6216
+ "learning_rate": 5.65e-06,
6217
+ "loss": 0.0006,
6218
+ "step": 8870
6219
+ },
6220
+ {
6221
+ "epoch": 0.888,
6222
+ "grad_norm": 0.07450267672538757,
6223
+ "learning_rate": 5.600000000000001e-06,
6224
+ "loss": 0.0009,
6225
+ "step": 8880
6226
+ },
6227
+ {
6228
+ "epoch": 0.889,
6229
+ "grad_norm": 0.006392148323357105,
6230
+ "learning_rate": 5.55e-06,
6231
+ "loss": 0.0006,
6232
+ "step": 8890
6233
+ },
6234
+ {
6235
+ "epoch": 0.89,
6236
+ "grad_norm": 0.0038995451759546995,
6237
+ "learning_rate": 5.500000000000001e-06,
6238
+ "loss": 0.0005,
6239
+ "step": 8900
6240
+ },
6241
+ {
6242
+ "epoch": 0.891,
6243
+ "grad_norm": 0.0028438065201044083,
6244
+ "learning_rate": 5.45e-06,
6245
+ "loss": 0.0004,
6246
+ "step": 8910
6247
+ },
6248
+ {
6249
+ "epoch": 0.892,
6250
+ "grad_norm": 0.003168331226333976,
6251
+ "learning_rate": 5.4e-06,
6252
+ "loss": 0.0004,
6253
+ "step": 8920
6254
+ },
6255
+ {
6256
+ "epoch": 0.893,
6257
+ "grad_norm": 0.0026163198053836823,
6258
+ "learning_rate": 5.3500000000000004e-06,
6259
+ "loss": 0.0004,
6260
+ "step": 8930
6261
+ },
6262
+ {
6263
+ "epoch": 0.894,
6264
+ "grad_norm": 0.0029086521826684475,
6265
+ "learning_rate": 5.3e-06,
6266
+ "loss": 0.0005,
6267
+ "step": 8940
6268
+ },
6269
+ {
6270
+ "epoch": 0.895,
6271
+ "grad_norm": 0.011433840729296207,
6272
+ "learning_rate": 5.25e-06,
6273
+ "loss": 0.0007,
6274
+ "step": 8950
6275
+ },
6276
+ {
6277
+ "epoch": 0.896,
6278
+ "grad_norm": 0.01782575435936451,
6279
+ "learning_rate": 5.2e-06,
6280
+ "loss": 0.0011,
6281
+ "step": 8960
6282
+ },
6283
+ {
6284
+ "epoch": 0.897,
6285
+ "grad_norm": 0.00613692682236433,
6286
+ "learning_rate": 5.15e-06,
6287
+ "loss": 0.0004,
6288
+ "step": 8970
6289
+ },
6290
+ {
6291
+ "epoch": 0.898,
6292
+ "grad_norm": 0.02408697083592415,
6293
+ "learning_rate": 5.1e-06,
6294
+ "loss": 0.0007,
6295
+ "step": 8980
6296
+ },
6297
+ {
6298
+ "epoch": 0.899,
6299
+ "grad_norm": 0.004028539173305035,
6300
+ "learning_rate": 5.050000000000001e-06,
6301
+ "loss": 0.0005,
6302
+ "step": 8990
6303
+ },
6304
+ {
6305
+ "epoch": 0.9,
6306
+ "grad_norm": 0.0032080088276416063,
6307
+ "learning_rate": 5e-06,
6308
+ "loss": 0.0005,
6309
+ "step": 9000
6310
+ },
6311
+ {
6312
+ "epoch": 0.901,
6313
+ "grad_norm": 0.0035681568551808596,
6314
+ "learning_rate": 4.950000000000001e-06,
6315
+ "loss": 0.0004,
6316
+ "step": 9010
6317
+ },
6318
+ {
6319
+ "epoch": 0.902,
6320
+ "grad_norm": 0.007591512985527515,
6321
+ "learning_rate": 4.9000000000000005e-06,
6322
+ "loss": 0.0005,
6323
+ "step": 9020
6324
+ },
6325
+ {
6326
+ "epoch": 0.903,
6327
+ "grad_norm": 0.004855870269238949,
6328
+ "learning_rate": 4.85e-06,
6329
+ "loss": 0.0004,
6330
+ "step": 9030
6331
+ },
6332
+ {
6333
+ "epoch": 0.904,
6334
+ "grad_norm": 0.004854188766330481,
6335
+ "learning_rate": 4.800000000000001e-06,
6336
+ "loss": 0.0004,
6337
+ "step": 9040
6338
+ },
6339
+ {
6340
+ "epoch": 0.905,
6341
+ "grad_norm": 0.004117886070162058,
6342
+ "learning_rate": 4.75e-06,
6343
+ "loss": 0.0005,
6344
+ "step": 9050
6345
+ },
6346
+ {
6347
+ "epoch": 0.906,
6348
+ "grad_norm": 0.0045243133790791035,
6349
+ "learning_rate": 4.7e-06,
6350
+ "loss": 0.0005,
6351
+ "step": 9060
6352
+ },
6353
+ {
6354
+ "epoch": 0.907,
6355
+ "grad_norm": 0.001863984507508576,
6356
+ "learning_rate": 4.65e-06,
6357
+ "loss": 0.0004,
6358
+ "step": 9070
6359
+ },
6360
+ {
6361
+ "epoch": 0.908,
6362
+ "grad_norm": 0.002472365740686655,
6363
+ "learning_rate": 4.6e-06,
6364
+ "loss": 0.0005,
6365
+ "step": 9080
6366
+ },
6367
+ {
6368
+ "epoch": 0.909,
6369
+ "grad_norm": 0.0020466954447329044,
6370
+ "learning_rate": 4.5500000000000005e-06,
6371
+ "loss": 0.0004,
6372
+ "step": 9090
6373
+ },
6374
+ {
6375
+ "epoch": 0.91,
6376
+ "grad_norm": 0.004180034622550011,
6377
+ "learning_rate": 4.5e-06,
6378
+ "loss": 0.0004,
6379
+ "step": 9100
6380
+ },
6381
+ {
6382
+ "epoch": 0.911,
6383
+ "grad_norm": 0.00341266137547791,
6384
+ "learning_rate": 4.45e-06,
6385
+ "loss": 0.0006,
6386
+ "step": 9110
6387
+ },
6388
+ {
6389
+ "epoch": 0.912,
6390
+ "grad_norm": 0.006567875389009714,
6391
+ "learning_rate": 4.4e-06,
6392
+ "loss": 0.0004,
6393
+ "step": 9120
6394
+ },
6395
+ {
6396
+ "epoch": 0.913,
6397
+ "grad_norm": 0.003975498490035534,
6398
+ "learning_rate": 4.35e-06,
6399
+ "loss": 0.0006,
6400
+ "step": 9130
6401
+ },
6402
+ {
6403
+ "epoch": 0.914,
6404
+ "grad_norm": 0.003391894046217203,
6405
+ "learning_rate": 4.2999999999999995e-06,
6406
+ "loss": 0.0006,
6407
+ "step": 9140
6408
+ },
6409
+ {
6410
+ "epoch": 0.915,
6411
+ "grad_norm": 0.005821021273732185,
6412
+ "learning_rate": 4.250000000000001e-06,
6413
+ "loss": 0.0004,
6414
+ "step": 9150
6415
+ },
6416
+ {
6417
+ "epoch": 0.916,
6418
+ "grad_norm": 0.0022448371164500713,
6419
+ "learning_rate": 4.2000000000000004e-06,
6420
+ "loss": 0.0004,
6421
+ "step": 9160
6422
+ },
6423
+ {
6424
+ "epoch": 0.917,
6425
+ "grad_norm": 0.003718709573149681,
6426
+ "learning_rate": 4.15e-06,
6427
+ "loss": 0.0004,
6428
+ "step": 9170
6429
+ },
6430
+ {
6431
+ "epoch": 0.918,
6432
+ "grad_norm": 0.008243223652243614,
6433
+ "learning_rate": 4.1000000000000006e-06,
6434
+ "loss": 0.0007,
6435
+ "step": 9180
6436
+ },
6437
+ {
6438
+ "epoch": 0.919,
6439
+ "grad_norm": 0.010773789137601852,
6440
+ "learning_rate": 4.05e-06,
6441
+ "loss": 0.0007,
6442
+ "step": 9190
6443
+ },
6444
+ {
6445
+ "epoch": 0.92,
6446
+ "grad_norm": 0.006589268799871206,
6447
+ "learning_rate": 4.000000000000001e-06,
6448
+ "loss": 0.0005,
6449
+ "step": 9200
6450
+ },
6451
+ {
6452
+ "epoch": 0.921,
6453
+ "grad_norm": 0.0026856744661927223,
6454
+ "learning_rate": 3.95e-06,
6455
+ "loss": 0.0004,
6456
+ "step": 9210
6457
+ },
6458
+ {
6459
+ "epoch": 0.922,
6460
+ "grad_norm": 0.012134186923503876,
6461
+ "learning_rate": 3.9e-06,
6462
+ "loss": 0.0005,
6463
+ "step": 9220
6464
+ },
6465
+ {
6466
+ "epoch": 0.923,
6467
+ "grad_norm": 0.004260225687175989,
6468
+ "learning_rate": 3.85e-06,
6469
+ "loss": 0.0005,
6470
+ "step": 9230
6471
+ },
6472
+ {
6473
+ "epoch": 0.924,
6474
+ "grad_norm": 0.0023803950753062963,
6475
+ "learning_rate": 3.8e-06,
6476
+ "loss": 0.0004,
6477
+ "step": 9240
6478
+ },
6479
+ {
6480
+ "epoch": 0.925,
6481
+ "grad_norm": 0.0037502460181713104,
6482
+ "learning_rate": 3.75e-06,
6483
+ "loss": 0.0005,
6484
+ "step": 9250
6485
+ },
6486
+ {
6487
+ "epoch": 0.926,
6488
+ "grad_norm": 0.0017525887815281749,
6489
+ "learning_rate": 3.7e-06,
6490
+ "loss": 0.0003,
6491
+ "step": 9260
6492
+ },
6493
+ {
6494
+ "epoch": 0.927,
6495
+ "grad_norm": 0.003996537532657385,
6496
+ "learning_rate": 3.6499999999999998e-06,
6497
+ "loss": 0.0005,
6498
+ "step": 9270
6499
+ },
6500
+ {
6501
+ "epoch": 0.928,
6502
+ "grad_norm": 0.009158821776509285,
6503
+ "learning_rate": 3.6e-06,
6504
+ "loss": 0.0007,
6505
+ "step": 9280
6506
+ },
6507
+ {
6508
+ "epoch": 0.929,
6509
+ "grad_norm": 0.003372638253495097,
6510
+ "learning_rate": 3.55e-06,
6511
+ "loss": 0.0004,
6512
+ "step": 9290
6513
+ },
6514
+ {
6515
+ "epoch": 0.93,
6516
+ "grad_norm": 0.0026602360885590315,
6517
+ "learning_rate": 3.5000000000000004e-06,
6518
+ "loss": 0.0004,
6519
+ "step": 9300
6520
+ },
6521
+ {
6522
+ "epoch": 0.931,
6523
+ "grad_norm": 0.014532738365232944,
6524
+ "learning_rate": 3.4500000000000004e-06,
6525
+ "loss": 0.0007,
6526
+ "step": 9310
6527
+ },
6528
+ {
6529
+ "epoch": 0.932,
6530
+ "grad_norm": 0.002912462456151843,
6531
+ "learning_rate": 3.4000000000000005e-06,
6532
+ "loss": 0.0004,
6533
+ "step": 9320
6534
+ },
6535
+ {
6536
+ "epoch": 0.933,
6537
+ "grad_norm": 0.0052029709331691265,
6538
+ "learning_rate": 3.3500000000000005e-06,
6539
+ "loss": 0.0006,
6540
+ "step": 9330
6541
+ },
6542
+ {
6543
+ "epoch": 0.934,
6544
+ "grad_norm": 0.016220854595303535,
6545
+ "learning_rate": 3.3e-06,
6546
+ "loss": 0.0004,
6547
+ "step": 9340
6548
+ },
6549
+ {
6550
+ "epoch": 0.935,
6551
+ "grad_norm": 0.0030162036418914795,
6552
+ "learning_rate": 3.2500000000000002e-06,
6553
+ "loss": 0.0004,
6554
+ "step": 9350
6555
+ },
6556
+ {
6557
+ "epoch": 0.936,
6558
+ "grad_norm": 0.002491691382601857,
6559
+ "learning_rate": 3.2000000000000003e-06,
6560
+ "loss": 0.0004,
6561
+ "step": 9360
6562
+ },
6563
+ {
6564
+ "epoch": 0.937,
6565
+ "grad_norm": 0.022630969062447548,
6566
+ "learning_rate": 3.1500000000000003e-06,
6567
+ "loss": 0.0005,
6568
+ "step": 9370
6569
+ },
6570
+ {
6571
+ "epoch": 0.938,
6572
+ "grad_norm": 0.005951160565018654,
6573
+ "learning_rate": 3.1e-06,
6574
+ "loss": 0.0005,
6575
+ "step": 9380
6576
+ },
6577
+ {
6578
+ "epoch": 0.939,
6579
+ "grad_norm": 0.0024763622786849737,
6580
+ "learning_rate": 3.05e-06,
6581
+ "loss": 0.0005,
6582
+ "step": 9390
6583
+ },
6584
+ {
6585
+ "epoch": 0.94,
6586
+ "grad_norm": 0.0049979290924966335,
6587
+ "learning_rate": 3e-06,
6588
+ "loss": 0.0005,
6589
+ "step": 9400
6590
+ },
6591
+ {
6592
+ "epoch": 0.941,
6593
+ "grad_norm": 0.0025999436620622873,
6594
+ "learning_rate": 2.95e-06,
6595
+ "loss": 0.0004,
6596
+ "step": 9410
6597
+ },
6598
+ {
6599
+ "epoch": 0.942,
6600
+ "grad_norm": 0.004584169946610928,
6601
+ "learning_rate": 2.9e-06,
6602
+ "loss": 0.0006,
6603
+ "step": 9420
6604
+ },
6605
+ {
6606
+ "epoch": 0.943,
6607
+ "grad_norm": 0.005211680196225643,
6608
+ "learning_rate": 2.8500000000000002e-06,
6609
+ "loss": 0.0005,
6610
+ "step": 9430
6611
+ },
6612
+ {
6613
+ "epoch": 0.944,
6614
+ "grad_norm": 0.0022507943212985992,
6615
+ "learning_rate": 2.8000000000000003e-06,
6616
+ "loss": 0.0004,
6617
+ "step": 9440
6618
+ },
6619
+ {
6620
+ "epoch": 0.945,
6621
+ "grad_norm": 0.0029024691320955753,
6622
+ "learning_rate": 2.7500000000000004e-06,
6623
+ "loss": 0.0004,
6624
+ "step": 9450
6625
+ },
6626
+ {
6627
+ "epoch": 0.946,
6628
+ "grad_norm": 0.003968573175370693,
6629
+ "learning_rate": 2.7e-06,
6630
+ "loss": 0.0004,
6631
+ "step": 9460
6632
+ },
6633
+ {
6634
+ "epoch": 0.947,
6635
+ "grad_norm": 0.003264777595177293,
6636
+ "learning_rate": 2.65e-06,
6637
+ "loss": 0.0005,
6638
+ "step": 9470
6639
+ },
6640
+ {
6641
+ "epoch": 0.948,
6642
+ "grad_norm": 0.0048127188347280025,
6643
+ "learning_rate": 2.6e-06,
6644
+ "loss": 0.0004,
6645
+ "step": 9480
6646
+ },
6647
+ {
6648
+ "epoch": 0.949,
6649
+ "grad_norm": 0.004405410494655371,
6650
+ "learning_rate": 2.55e-06,
6651
+ "loss": 0.0006,
6652
+ "step": 9490
6653
+ },
6654
+ {
6655
+ "epoch": 0.95,
6656
+ "grad_norm": 0.00462340796366334,
6657
+ "learning_rate": 2.5e-06,
6658
+ "loss": 0.0004,
6659
+ "step": 9500
6660
+ },
6661
+ {
6662
+ "epoch": 0.951,
6663
+ "grad_norm": 0.0021721452940255404,
6664
+ "learning_rate": 2.4500000000000003e-06,
6665
+ "loss": 0.0004,
6666
+ "step": 9510
6667
+ },
6668
+ {
6669
+ "epoch": 0.952,
6670
+ "grad_norm": 0.002355078933760524,
6671
+ "learning_rate": 2.4000000000000003e-06,
6672
+ "loss": 0.0006,
6673
+ "step": 9520
6674
+ },
6675
+ {
6676
+ "epoch": 0.953,
6677
+ "grad_norm": 0.0022414589766412973,
6678
+ "learning_rate": 2.35e-06,
6679
+ "loss": 0.0004,
6680
+ "step": 9530
6681
+ },
6682
+ {
6683
+ "epoch": 0.954,
6684
+ "grad_norm": 0.012005253694951534,
6685
+ "learning_rate": 2.3e-06,
6686
+ "loss": 0.0006,
6687
+ "step": 9540
6688
+ },
6689
+ {
6690
+ "epoch": 0.955,
6691
+ "grad_norm": 0.00513832364231348,
6692
+ "learning_rate": 2.25e-06,
6693
+ "loss": 0.0005,
6694
+ "step": 9550
6695
+ },
6696
+ {
6697
+ "epoch": 0.956,
6698
+ "grad_norm": 0.0027625642251223326,
6699
+ "learning_rate": 2.2e-06,
6700
+ "loss": 0.0005,
6701
+ "step": 9560
6702
+ },
6703
+ {
6704
+ "epoch": 0.957,
6705
+ "grad_norm": 0.008645957335829735,
6706
+ "learning_rate": 2.1499999999999997e-06,
6707
+ "loss": 0.0005,
6708
+ "step": 9570
6709
+ },
6710
+ {
6711
+ "epoch": 0.958,
6712
+ "grad_norm": 0.00188863230869174,
6713
+ "learning_rate": 2.1000000000000002e-06,
6714
+ "loss": 0.0004,
6715
+ "step": 9580
6716
+ },
6717
+ {
6718
+ "epoch": 0.959,
6719
+ "grad_norm": 0.0025561931543052197,
6720
+ "learning_rate": 2.0500000000000003e-06,
6721
+ "loss": 0.0004,
6722
+ "step": 9590
6723
+ },
6724
+ {
6725
+ "epoch": 0.96,
6726
+ "grad_norm": 0.0033618698362261057,
6727
+ "learning_rate": 2.0000000000000003e-06,
6728
+ "loss": 0.0004,
6729
+ "step": 9600
6730
+ },
6731
+ {
6732
+ "epoch": 0.961,
6733
+ "grad_norm": 0.0018735548947006464,
6734
+ "learning_rate": 1.95e-06,
6735
+ "loss": 0.0004,
6736
+ "step": 9610
6737
+ },
6738
+ {
6739
+ "epoch": 0.962,
6740
+ "grad_norm": 0.0019359014695510268,
6741
+ "learning_rate": 1.9e-06,
6742
+ "loss": 0.0005,
6743
+ "step": 9620
6744
+ },
6745
+ {
6746
+ "epoch": 0.963,
6747
+ "grad_norm": 0.005369434133172035,
6748
+ "learning_rate": 1.85e-06,
6749
+ "loss": 0.0004,
6750
+ "step": 9630
6751
+ },
6752
+ {
6753
+ "epoch": 0.964,
6754
+ "grad_norm": 0.0017576682148501277,
6755
+ "learning_rate": 1.8e-06,
6756
+ "loss": 0.0004,
6757
+ "step": 9640
6758
+ },
6759
+ {
6760
+ "epoch": 0.965,
6761
+ "grad_norm": 0.002633103635162115,
6762
+ "learning_rate": 1.7500000000000002e-06,
6763
+ "loss": 0.0004,
6764
+ "step": 9650
6765
+ },
6766
+ {
6767
+ "epoch": 0.966,
6768
+ "grad_norm": 0.007023205049335957,
6769
+ "learning_rate": 1.7000000000000002e-06,
6770
+ "loss": 0.0004,
6771
+ "step": 9660
6772
+ },
6773
+ {
6774
+ "epoch": 0.967,
6775
+ "grad_norm": 0.0026062438264489174,
6776
+ "learning_rate": 1.65e-06,
6777
+ "loss": 0.0005,
6778
+ "step": 9670
6779
+ },
6780
+ {
6781
+ "epoch": 0.968,
6782
+ "grad_norm": 0.0025111304130405188,
6783
+ "learning_rate": 1.6000000000000001e-06,
6784
+ "loss": 0.0005,
6785
+ "step": 9680
6786
+ },
6787
+ {
6788
+ "epoch": 0.969,
6789
+ "grad_norm": 0.0028218806255608797,
6790
+ "learning_rate": 1.55e-06,
6791
+ "loss": 0.0004,
6792
+ "step": 9690
6793
+ },
6794
+ {
6795
+ "epoch": 0.97,
6796
+ "grad_norm": 0.0024802633561193943,
6797
+ "learning_rate": 1.5e-06,
6798
+ "loss": 0.0005,
6799
+ "step": 9700
6800
+ },
6801
+ {
6802
+ "epoch": 0.971,
6803
+ "grad_norm": 0.002883223118260503,
6804
+ "learning_rate": 1.45e-06,
6805
+ "loss": 0.0005,
6806
+ "step": 9710
6807
+ },
6808
+ {
6809
+ "epoch": 0.972,
6810
+ "grad_norm": 0.002503247233107686,
6811
+ "learning_rate": 1.4000000000000001e-06,
6812
+ "loss": 0.0005,
6813
+ "step": 9720
6814
+ },
6815
+ {
6816
+ "epoch": 0.973,
6817
+ "grad_norm": 0.002495008986443281,
6818
+ "learning_rate": 1.35e-06,
6819
+ "loss": 0.0006,
6820
+ "step": 9730
6821
+ },
6822
+ {
6823
+ "epoch": 0.974,
6824
+ "grad_norm": 0.03429775312542915,
6825
+ "learning_rate": 1.3e-06,
6826
+ "loss": 0.0006,
6827
+ "step": 9740
6828
+ },
6829
+ {
6830
+ "epoch": 0.975,
6831
+ "grad_norm": 0.003482217201963067,
6832
+ "learning_rate": 1.25e-06,
6833
+ "loss": 0.0004,
6834
+ "step": 9750
6835
+ },
6836
+ {
6837
+ "epoch": 0.976,
6838
+ "grad_norm": 0.001837963704019785,
6839
+ "learning_rate": 1.2000000000000002e-06,
6840
+ "loss": 0.0004,
6841
+ "step": 9760
6842
+ },
6843
+ {
6844
+ "epoch": 0.977,
6845
+ "grad_norm": 0.0020507893059402704,
6846
+ "learning_rate": 1.15e-06,
6847
+ "loss": 0.0004,
6848
+ "step": 9770
6849
+ },
6850
+ {
6851
+ "epoch": 0.978,
6852
+ "grad_norm": 0.0022647210862487555,
6853
+ "learning_rate": 1.1e-06,
6854
+ "loss": 0.0005,
6855
+ "step": 9780
6856
+ },
6857
+ {
6858
+ "epoch": 0.979,
6859
+ "grad_norm": 0.0017425378318876028,
6860
+ "learning_rate": 1.0500000000000001e-06,
6861
+ "loss": 0.0004,
6862
+ "step": 9790
6863
+ },
6864
+ {
6865
+ "epoch": 0.98,
6866
+ "grad_norm": 0.2319187968969345,
6867
+ "learning_rate": 1.0000000000000002e-06,
6868
+ "loss": 0.0021,
6869
+ "step": 9800
6870
+ },
6871
+ {
6872
+ "epoch": 0.981,
6873
+ "grad_norm": 0.01799739897251129,
6874
+ "learning_rate": 9.5e-07,
6875
+ "loss": 0.0006,
6876
+ "step": 9810
6877
+ },
6878
+ {
6879
+ "epoch": 0.982,
6880
+ "grad_norm": 0.007147952448576689,
6881
+ "learning_rate": 9e-07,
6882
+ "loss": 0.0005,
6883
+ "step": 9820
6884
+ },
6885
+ {
6886
+ "epoch": 0.983,
6887
+ "grad_norm": 0.004181794356554747,
6888
+ "learning_rate": 8.500000000000001e-07,
6889
+ "loss": 0.0005,
6890
+ "step": 9830
6891
+ },
6892
+ {
6893
+ "epoch": 0.984,
6894
+ "grad_norm": 0.00277232495136559,
6895
+ "learning_rate": 8.000000000000001e-07,
6896
+ "loss": 0.0004,
6897
+ "step": 9840
6898
+ },
6899
+ {
6900
+ "epoch": 0.985,
6901
+ "grad_norm": 0.0024797001387923956,
6902
+ "learning_rate": 7.5e-07,
6903
+ "loss": 0.0006,
6904
+ "step": 9850
6905
+ },
6906
+ {
6907
+ "epoch": 0.986,
6908
+ "grad_norm": 0.002748242113739252,
6909
+ "learning_rate": 7.000000000000001e-07,
6910
+ "loss": 0.0005,
6911
+ "step": 9860
6912
+ },
6913
+ {
6914
+ "epoch": 0.987,
6915
+ "grad_norm": 0.002988820429891348,
6916
+ "learning_rate": 6.5e-07,
6917
+ "loss": 0.0004,
6918
+ "step": 9870
6919
+ },
6920
+ {
6921
+ "epoch": 0.988,
6922
+ "grad_norm": 0.002272873418405652,
6923
+ "learning_rate": 6.000000000000001e-07,
6924
+ "loss": 0.0006,
6925
+ "step": 9880
6926
+ },
6927
+ {
6928
+ "epoch": 0.989,
6929
+ "grad_norm": 0.0028824047185480595,
6930
+ "learning_rate": 5.5e-07,
6931
+ "loss": 0.0005,
6932
+ "step": 9890
6933
+ },
6934
+ {
6935
+ "epoch": 0.99,
6936
+ "grad_norm": 0.013895529322326183,
6937
+ "learning_rate": 5.000000000000001e-07,
6938
+ "loss": 0.0005,
6939
+ "step": 9900
6940
+ },
6941
+ {
6942
+ "epoch": 0.991,
6943
+ "grad_norm": 0.004210934974253178,
6944
+ "learning_rate": 4.5e-07,
6945
+ "loss": 0.0004,
6946
+ "step": 9910
6947
+ },
6948
+ {
6949
+ "epoch": 0.992,
6950
+ "grad_norm": 0.0017349456902593374,
6951
+ "learning_rate": 4.0000000000000003e-07,
6952
+ "loss": 0.0005,
6953
+ "step": 9920
6954
+ },
6955
+ {
6956
+ "epoch": 0.993,
6957
+ "grad_norm": 0.0036622195038944483,
6958
+ "learning_rate": 3.5000000000000004e-07,
6959
+ "loss": 0.0003,
6960
+ "step": 9930
6961
+ },
6962
+ {
6963
+ "epoch": 0.994,
6964
+ "grad_norm": 0.02928483486175537,
6965
+ "learning_rate": 3.0000000000000004e-07,
6966
+ "loss": 0.0006,
6967
+ "step": 9940
6968
+ },
6969
+ {
6970
+ "epoch": 0.995,
6971
+ "grad_norm": 0.004271595273166895,
6972
+ "learning_rate": 2.5000000000000004e-07,
6973
+ "loss": 0.0004,
6974
+ "step": 9950
6975
+ },
6976
+ {
6977
+ "epoch": 0.996,
6978
+ "grad_norm": 0.004935207776725292,
6979
+ "learning_rate": 2.0000000000000002e-07,
6980
+ "loss": 0.0004,
6981
+ "step": 9960
6982
+ },
6983
+ {
6984
+ "epoch": 0.997,
6985
+ "grad_norm": 0.005258087068796158,
6986
+ "learning_rate": 1.5000000000000002e-07,
6987
+ "loss": 0.0004,
6988
+ "step": 9970
6989
+ },
6990
+ {
6991
+ "epoch": 0.998,
6992
+ "grad_norm": 0.0014150363858789206,
6993
+ "learning_rate": 1.0000000000000001e-07,
6994
+ "loss": 0.0004,
6995
+ "step": 9980
6996
+ },
6997
+ {
6998
+ "epoch": 0.999,
6999
+ "grad_norm": 0.003183445893228054,
7000
+ "learning_rate": 5.0000000000000004e-08,
7001
+ "loss": 0.0004,
7002
+ "step": 9990
7003
+ },
7004
+ {
7005
+ "epoch": 1.0,
7006
+ "grad_norm": 0.0063432566821575165,
7007
+ "learning_rate": 0.0,
7008
+ "loss": 0.0004,
7009
+ "step": 10000
7010
  }
7011
  ],
7012
  "logging_steps": 10,
 
7021
  "should_evaluate": false,
7022
  "should_log": false,
7023
  "should_save": true,
7024
+ "should_training_stop": true
7025
  },
7026
  "attributes": {}
7027
  }
7028
  },
7029
+ "total_flos": 3.6962203336704e+17,
7030
  "train_batch_size": 1,
7031
  "trial_name": null,
7032
  "trial_params": null