naxalpha commited on
Commit
185a6dd
1 Parent(s): fa10105

update info

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -46,6 +46,7 @@ Since it is not based on [transformers](https://github.com/huggingface/transform
46
  model.net.to_logits[1].weight.copy_(emb)
47
  ```
48
 
 
49
 
50
  ## Training Information
51
 
@@ -65,4 +66,16 @@ Here are the details of the training:
65
  - Tokens seen: `570 million`
66
  - Final loss: `~3.9`
67
 
68
- Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  model.net.to_logits[1].weight.copy_(emb)
47
  ```
48
 
49
+ Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).
50
 
51
  ## Training Information
52
 
 
66
  - Tokens seen: `570 million`
67
  - Final loss: `~3.9`
68
 
69
+ ## Fine-Tuning Info:
70
+
71
+ [model2.pt](https://huggingface.co/naxalpha/gated-state-space/blob/main/) is available as fine-tuned version with longer context length.
72
+
73
+ - Objective: `Simple Cross Entropy`
74
+ - Gradient Accumulation: `4`
75
+ - Batch Size: `1`
76
+ - Sequence Length: `2048`
77
+ - Learning Rate: `5e-6`
78
+ - Embeddings: `unfrozen for fine-tuning`
79
+ - Gradient Norm Clipping: `1.0`
80
+ - Hardware: `2x3090` on vast.ai
81
+ - Extra Tricks: `Used HuggingFace Accelerate with Full Sharding without CPU offload`