naxalpha
/

gated-state-space

@@ -60,8 +60,9 @@ Here are the details of the training:
 - Gradient Norm Clipping: `1.0`
 - Hardware: `RTX 3090` on [vast.ai](vast.ai)
 - Training Cost: `~20$`
-- Training Time: `~2 days`
-- Number of steps: `434,000`
-- Tokens seen: `444 million`
 Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).

 - Gradient Norm Clipping: `1.0`
 - Hardware: `RTX 3090` on [vast.ai](vast.ai)
 - Training Cost: `~20$`
+- Training Time: `~3 days`
+- Number of steps: `557,000`
+- Tokens seen: `570 million`
+- Final loss: `~3.9`
 Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).

README.md CHANGED Viewed

@@ -60,8 +60,9 @@ Here are the details of the training:
 - Gradient Norm Clipping: `1.0`
 - Hardware: `RTX 3090` on [vast.ai](vast.ai)
 - Training Cost: `~20$`
-- Training Time: `~2 days`
-- Number of steps: `434,000`
-- Tokens seen: `444 million`
 Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).

 - Gradient Norm Clipping: `1.0`
 - Hardware: `RTX 3090` on [vast.ai](vast.ai)
 - Training Cost: `~20$`
+- Training Time: `~3 days`
+- Number of steps: `557,000`
+- Tokens seen: `570 million`
+- Final loss: `~3.9`
 Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).