metadata

license: bsd-3-clause
datasets:
  - EleutherAI/pile
language:
  - en
library_name: transformers

Lovelace Medium Alpha1

550M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads

Model Architecture

Name	Value
Total Parameters	551M
Non-Embedding Parameters	512M
Vocab Size	50272
$d_\text{vocab}$	768
$d_\text{model}$	1536
$n_\text{layers}$	18
FFN Activation	SwiGLU
$d_\text{ffn}$	4096
Attention Type	Full
Positon Embedding	Reversed RoPE with ABF
$n_\text{heads}$	24
$d_\text{key}$	64
Trained Context	2048
Trained Memory	2048
Max Inference Context	4096

Model Collection

Model	Link
Pre-Trained Model	lovelace-medium-alpha1
Fine-Tuned Model	lovelace-medium-alpha1-sft
DPH Aligned Model	lovelace-medium-alpha1-dph