tomerkeren42 commited on
Commit
082b1b2
1 Parent(s): 2388414

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -63,11 +63,17 @@ The following techniques were used to shorten training time:
63
  - **Using EMA only in the last phase of training**
64
 
65
  ### Additional Details
66
-
67
- - **Hardware:** 8xA100 (80gb), 8xH100 (80gb)
68
- - **Optimizer:** AdamW/LAMB (phase1/phase2-4)
69
- - **Batch:** 8192/6144
70
- - **Learning rate:** 1e-4/5e-3 (phase1/phase2-4)
 
 
 
 
 
 
71
 
72
  ## Evaluation
73
 
 
63
  - **Using EMA only in the last phase of training**
64
 
65
  ### Additional Details
66
+ #### Phase 1
67
+ - **Hardware:** 8 x 8 x A100 (80gb)
68
+ - **Optimizer:** AdamW
69
+ - **Batch:** 8192
70
+ - **Learning rate:** 1e-4
71
+
72
+ #### Phase 2-4
73
+ - **Hardware:** 8 x 8 x H100 (80gb)
74
+ - **Optimizer:** LAMB
75
+ - **Batch:** 6144
76
+ - **Learning rate:** 5e-3
77
 
78
  ## Evaluation
79