Update README.md
Browse files
README.md
CHANGED
@@ -6,5 +6,36 @@ language:
|
|
6 |
- en
|
7 |
library_name: transformers
|
8 |
---
|
|
|
9 |
|
10 |
-
550M parameter Transformer-XL style model trained on 100B tokens
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- en
|
7 |
library_name: transformers
|
8 |
---
|
9 |
+
# Lovelace Medium Alpha1
|
10 |
|
11 |
+
550M parameter Transformer-XL style model trained on 100B tokens of The Pile!
|
12 |
+
|
13 |
+
This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research.
|
14 |
+
All code used to train and run these models is available here: https://github.com/Avelina9X/memory-transformer-pt4
|
15 |
+
|
16 |
+
## Model Architecture
|
17 |
+
| Name | Value |
|
18 |
+
| --- | --- |
|
19 |
+
| Total Parameters | 551M |
|
20 |
+
| Non-Embedding Parameters | 512M |
|
21 |
+
| Vocab Size | 50272 |
|
22 |
+
| \\(d_\text{vocab}\\) | 768 |
|
23 |
+
| \\(d_\text{model}\\) | 1536 |
|
24 |
+
| \\(n_\text{layers}\\) | 18 |
|
25 |
+
| FFN Activation | SwiGLU |
|
26 |
+
| \\(d_\text{ffn}\\) | 4096 |
|
27 |
+
| Attention Type | Full |
|
28 |
+
| Positon Embedding | Reversed RoPE with ABF |
|
29 |
+
| \\(n_\text{heads}\\) | 24 |
|
30 |
+
| \\(d_\text{key}\\) | 64 |
|
31 |
+
| Trained Context | 2048 |
|
32 |
+
| Trained Memory | 2048 |
|
33 |
+
| Max Inference Context | 4096 |
|
34 |
+
|
35 |
+
## Model Collection
|
36 |
+
| Model | Link |
|
37 |
+
| --- | --- |
|
38 |
+
| Pre-Trained Model | [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) |
|
39 |
+
| Fine-Tuned Model | lovelace-medium-alpha1-instruct |
|
40 |
+
| DPH Aligned Model | lovelace-medium-alpha1-instruct-hf |
|
41 |
+
| DPH Aligned Model (Multiple Heads) | lovelace-medium-alpha1-instruct-hf-multihead |
|