Avelina
/

lovelace-medium-alpha1

Text Generation

lsw_transformer

Inference Endpoints

Model card Files Files and versions Community

lovelace-medium-alpha1 / README.md

Avelina's picture

Update README.md

a893f57 verified 7 months ago

|

history blame contribute delete

1.57 kB

metadata

license: bsd-3-clause
datasets:
  - EleutherAI/pile
language:
  - en
library_name: transformers

Lovelace Medium Alpha1

551M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053

Model Architecture

Name	Value
Total Parameters	551M
Non-Embedding Parameters	512M
Vocab Size	50272
$d_\text{vocab}$	768
$d_\text{model}$	1536
$n_\text{layers}$	18
FFN Activation	SwiGLU
$d_\text{ffn}$	4096
Attention Type	Full
Positon Embedding	Reversed RoPE with ABF
$n_\text{heads}$	24
$d_\text{key}$	64
Trained Context	2048
Trained Memory	2048
Max Inference Context	4096

Model Collection

Model	Link
Pre-Trained Model	lovelace-medium-alpha1
Fine-Tuned Model	lovelace-medium-alpha1-sft
DPH Aligned Model	lovelace-medium-alpha1-dph