File size: 1,500 Bytes
957343c
 
027812b
 
 
 
 
957343c
7bbad8c
027812b
7bbad8c
 
 
e13de6f
7bbad8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e1dbc62
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: bsd-3-clause
datasets:
- EleutherAI/pile
language:
- en
library_name: transformers
---
# Lovelace Medium Alpha1

550M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research.
All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads

## Model Architecture
| Name | Value |
| --- | --- |
| Total Parameters         | 551M |
| Non-Embedding Parameters | 512M |
| Vocab Size               | 50272 |
| \\(d_\text{vocab}\\)     | 768 |
| \\(d_\text{model}\\)     | 1536 |
| \\(n_\text{layers}\\)    | 18 |
| FFN Activation           | SwiGLU |
| \\(d_\text{ffn}\\)       | 4096 |
| Attention Type           | Full |
| Positon Embedding        | Reversed RoPE with ABF |
| \\(n_\text{heads}\\)     | 24 |
| \\(d_\text{key}\\)       | 64 |
| Trained Context          | 2048 |
| Trained Memory           | 2048 |
| Max Inference Context    | 4096 |

## Model Collection
| Model | Link |
| --- | --- |
| Pre-Trained Model                  | [lovelace-medium-alpha1](https://huggingface.co/Avelina/lovelace-medium-alpha1) |
| Fine-Tuned Model                   | [lovelace-medium-alpha1-sft](https://huggingface.co/Avelina/lovelace-medium-alpha1-sft) |
| DPH Aligned Model                  | [lovelace-medium-alpha1-dph](https://huggingface.co/Avelina/lovelace-medium-alpha1-dph) |