Avelina's picture
Update README.md
e1dbc62 verified
|
raw
history blame
1.5 kB
metadata
license: bsd-3-clause
datasets:
  - EleutherAI/pile
language:
  - en
library_name: transformers

Lovelace Medium Alpha1

550M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads

Model Architecture

Name Value
Total Parameters 551M
Non-Embedding Parameters 512M
Vocab Size 50272
dvocabd_\text{vocab} 768
dmodeld_\text{model} 1536
nlayersn_\text{layers} 18
FFN Activation SwiGLU
dffnd_\text{ffn} 4096
Attention Type Full
Positon Embedding Reversed RoPE with ABF
nheadsn_\text{heads} 24
dkeyd_\text{key} 64
Trained Context 2048
Trained Memory 2048
Max Inference Context 4096

Model Collection

Model Link
Pre-Trained Model lovelace-medium-alpha1
Fine-Tuned Model lovelace-medium-alpha1-sft
DPH Aligned Model lovelace-medium-alpha1-dph