Avelina's picture
Update README.md
a893f57 verified
metadata
license: bsd-3-clause
datasets:
  - EleutherAI/pile
language:
  - en
library_name: transformers

Lovelace Medium Alpha1

551M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053

Model Architecture

Name Value
Total Parameters 551M
Non-Embedding Parameters 512M
Vocab Size 50272
dvocabd_\text{vocab} 768
dmodeld_\text{model} 1536
nlayersn_\text{layers} 18
FFN Activation SwiGLU
dffnd_\text{ffn} 4096
Attention Type Full
Positon Embedding Reversed RoPE with ABF
nheadsn_\text{heads} 24
dkeyd_\text{key} 64
Trained Context 2048
Trained Memory 2048
Max Inference Context 4096

Model Collection

Model Link
Pre-Trained Model lovelace-medium-alpha1
Fine-Tuned Model lovelace-medium-alpha1-sft
DPH Aligned Model lovelace-medium-alpha1-dph