Lovelace Medium Alpha1

551M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053

Model Architecture

Name Value
Total Parameters 551M
Non-Embedding Parameters 512M
Vocab Size 50272
dvocabd_\text{vocab} 768
dmodeld_\text{model} 1536
nlayersn_\text{layers} 18
FFN Activation SwiGLU
dffnd_\text{ffn} 4096
Attention Type Full
Positon Embedding Reversed RoPE with ABF
nheadsn_\text{heads} 24
dkeyd_\text{key} 64
Trained Context 2048
Trained Memory 2048
Max Inference Context 4096

Model Collection

Model Link
Pre-Trained Model lovelace-medium-alpha1
Fine-Tuned Model lovelace-medium-alpha1-sft
DPH Aligned Model lovelace-medium-alpha1-dph
Downloads last month
13
Safetensors
Model size
551M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Avelina/lovelace-medium-alpha1

Collection including Avelina/lovelace-medium-alpha1