Avelina
/

lovelace-medium-alpha1

Text Generation

lsw_transformer

Inference Endpoints

Model card Files Files and versions Community

Avelina commited on Apr 26

Commit

7bbad8c

•

1 Parent(s): 22ff118

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -6,5 +6,36 @@ language:
 - en
 library_name: transformers
 ---
-550M parameter Transformer-XL style model trained on 100B tokens from The Pile.

 - en
 library_name: transformers
 ---
+# Lovelace Medium Alpha1
+550M parameter Transformer-XL style model trained on 100B tokens of The Pile!
+This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research.
+All code used to train and run these models is available here: https://github.com/Avelina9X/memory-transformer-pt4
+## Model Architecture
+| Name | Value |
+| --- | --- |
+| Total Parameters         | 551M |
+| Non-Embedding Parameters | 512M |
+| Vocab Size               | 50272 |
+| \\(d_\text{vocab}\\)     | 768 |
+| \\(d_\text{model}\\)     | 1536 |
+| \\(n_\text{layers}\\)    | 18 |
+| FFN Activation           | SwiGLU |
+| \\(d_\text{ffn}\\)       | 4096 |
+| Attention Type           | Full |
+| Positon Embedding        | Reversed RoPE with ABF |
+| \\(n_\text{heads}\\)     | 24 |
+| \\(d_\text{key}\\)       | 64 |
+| Trained Context          | 2048 |
+| Trained Memory           | 2048 |
+| Max Inference Context    | 4096 |
+## Model Collection
+| Model | Link |
+| --- | --- |
+| Pre-Trained Model                  | [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) |
+| Fine-Tuned Model                   | lovelace-medium-alpha1-instruct |
+| DPH Aligned Model                  | lovelace-medium-alpha1-instruct-hf |
+| DPH Aligned Model (Multiple Heads) | lovelace-medium-alpha1-instruct-hf-multihead |