naxalpha
/

gated-state-space

Text Generation

lucidrains/gated-state-spaces-pytorch

text generation

gated-state-space

Model card Files Files and versions Community

naxalpha commited on Dec 18, 2022

Commit

185a6dd

•

1 Parent(s): fa10105

update info

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -46,6 +46,7 @@ Since it is not based on [transformers](https://github.com/huggingface/transform
     model.net.to_logits[1].weight.copy_(emb)
 ```
 ## Training Information
@@ -65,4 +66,16 @@ Here are the details of the training:
 - Tokens seen: `570 million`
 - Final loss: `~3.9`
-Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).

     model.net.to_logits[1].weight.copy_(emb)
 ```
+Training code is available in this repo. [Link to the training script](https://huggingface.co/naxalpha/gated-state-space/blob/main/app.py).
 ## Training Information
 - Tokens seen: `570 million`
 - Final loss: `~3.9`
+## Fine-Tuning Info:
+[model2.pt](https://huggingface.co/naxalpha/gated-state-space/blob/main/) is available as fine-tuned version with longer context length.
+- Objective: `Simple Cross Entropy`
+- Gradient Accumulation: `4`
+- Batch Size: `1`
+- Sequence Length: `2048`
+- Learning Rate: `5e-6`
+- Embeddings: `unfrozen for fine-tuning`
+- Gradient Norm Clipping: `1.0`
+- Hardware: `2x3090` on vast.ai
+- Extra Tricks: `Used HuggingFace Accelerate with Full Sharding without CPU offload`