Avelina
/

lovelace-medium-alpha1

Text Generation

lsw_transformer

Inference Endpoints

Model card Files Files and versions Community

Lovelace Medium Alpha1

551M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/direct-preference-heads and our paper is available here: https://arxiv.org/abs/2405.20053

Model Architecture

Name	Value
Total Parameters	551M
Non-Embedding Parameters	512M
Vocab Size	50272
$d_\text{vocab}$	768
$d_\text{model}$	1536
$n_\text{layers}$	18
FFN Activation	SwiGLU
$d_\text{ffn}$	4096
Attention Type	Full
Positon Embedding	Reversed RoPE with ABF
$n_\text{heads}$	24
$d_\text{key}$	64
Trained Context	2048
Trained Memory	2048
Max Inference Context	4096

Model Collection

Model	Link
Pre-Trained Model	lovelace-medium-alpha1
Fine-Tuned Model	lovelace-medium-alpha1-sft
DPH Aligned Model	lovelace-medium-alpha1-dph

Downloads last month: 56

Safetensors

Model size

551M params

Tensor type

FP16

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Avelina/lovelace-medium-alpha1

Collection including Avelina/lovelace-medium-alpha1

Direct Preference Heads (Preprint)

This collection contains the pre-trained, fine-tuned and aligned models for the Direct Preference Heads paper. • 4 items • Updated May 31, 2024 • 1