Avelina commited on
Commit
7bbad8c
1 Parent(s): 22ff118

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -6,5 +6,36 @@ language:
6
  - en
7
  library_name: transformers
8
  ---
 
9
 
10
- 550M parameter Transformer-XL style model trained on 100B tokens from The Pile.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  library_name: transformers
8
  ---
9
+ # Lovelace Medium Alpha1
10
 
11
+ 550M parameter Transformer-XL style model trained on 100B tokens of The Pile!
12
+
13
+ This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research.
14
+ All code used to train and run these models is available here: https://github.com/Avelina9X/memory-transformer-pt4
15
+
16
+ ## Model Architecture
17
+ | Name | Value |
18
+ | --- | --- |
19
+ | Total Parameters | 551M |
20
+ | Non-Embedding Parameters | 512M |
21
+ | Vocab Size | 50272 |
22
+ | \\(d_\text{vocab}\\) | 768 |
23
+ | \\(d_\text{model}\\) | 1536 |
24
+ | \\(n_\text{layers}\\) | 18 |
25
+ | FFN Activation | SwiGLU |
26
+ | \\(d_\text{ffn}\\) | 4096 |
27
+ | Attention Type | Full |
28
+ | Positon Embedding | Reversed RoPE with ABF |
29
+ | \\(n_\text{heads}\\) | 24 |
30
+ | \\(d_\text{key}\\) | 64 |
31
+ | Trained Context | 2048 |
32
+ | Trained Memory | 2048 |
33
+ | Max Inference Context | 4096 |
34
+
35
+ ## Model Collection
36
+ | Model | Link |
37
+ | --- | --- |
38
+ | Pre-Trained Model | [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) |
39
+ | Fine-Tuned Model | lovelace-medium-alpha1-instruct |
40
+ | DPH Aligned Model | lovelace-medium-alpha1-instruct-hf |
41
+ | DPH Aligned Model (Multiple Heads) | lovelace-medium-alpha1-instruct-hf-multihead |